py4e - Data Structure

字串 (String)

find：這可以找出特定字元的位置 (index)。
len：計算字串的長度。
strip：去除字串前後的空白。他還有兄弟 lstrip 跟 rstrip，顧名思義就是只去除左邊或右邊的空白。
lower 或 upper：把字串變成全部小寫或全部大寫。
replace("old", "new")：用新字元替換舊字元。
startswith 和 endswith：判斷字串是否使用特定的前綴或後綴。

以下為簡單的範例：

text = "X-DSPAM-Confidence:    0.8475"
pos = text.find(' ')
textLen = len(text)
print(pos, textLen)
print(text[pos : textLen].strip())

tip

Python 並未提供字串反轉的方法，但可以透過 [] 裁切來達到字串反轉：

Reverse string
str = '12345'

new_str = str[::-1]

# '12345'
print(str)
# '54321'
print(new_str)

文件 (File)

Python 可以讀取文件，以下範例為使用課程提供之純文字文件作範例，如果想自己練習的話，可以自己建一支 txt 檔案。

open：開啟一個文件，可以加入第二個參數，比如 r 為唯讀、w 為編輯 (會從開啟處覆蓋掉舊內容)、a 為續寫。
read：檔案開啟後，逐行讀取檔案。readline 也可以讀取整行資料，但他只讀取一行，read 是逐行讀完整個檔案。

以下為簡單範例，會逐行印出去除左右空白且轉為大寫的字串：

fname = input("Enter file name: ")
fh = open(fname)
for line in fh:
    lh = line.strip().upper()
    print(lh)

綜合練習 (Meta b2e course)

讀取文件

def read_file(file_name):
    with open(file_name, 'r') as file:
        f_content = file.read()
        print(f_content)
        return f_content

讀取文件並把每行作為一個元素存成 list

def read_file_into_list(file_name):
    with open(file_name, 'r') as file:
        f_list = file.readlines()
        return f_list

新增 B 文件並寫入 A 文件的第一行

def write_first_line_to_file(file_contents, output_filename):
    with open(output_filename, 'w') as file:
        first_line = file_contents.splitlines()[0] + '\n'
        file.write(first_line)

只存取文件偶數行

def read_even_numbered_lines(file_name):
    with open(file_name, 'r') as file:
        lines = file.readlines()
        even_lines = []
        counts = 0
        for line in lines:
            counts += 1
            if counts % 2 == 0:
                even_lines.append(line)
        return even_lines

反轉文件清單

def read_file_in_reverse(file_name):
    with open(file_name, 'r') as file:
        lines = file.readlines()
        lines.reverse()
        return lines

列表 (List)

用 JavaScript 的思維想就是陣列，基本操作跟字串大差不差，但列表中的元素可以直接修改，字串不行。

list
list_example = [1, 2, 3, 4, 5]

lst["start" : "end"]：索引或裁切，這個同樣適用於字串。
max、min、sum：取列表的最大、最小或值得總和。
append、insert：前者為向列表後插入一個元素；後者接受兩個參數，第一個是插入位置，第二個是插入元素。
pop、remove：前者不加參數時會直接從列表後方取出最後一個元素並從原列表中山出該元素，帶參數則是取出指定位置參數；後者會去搜尋整個列表符合參數的部分，但僅會刪除遇到第一個符合的部分。
index：可以返回元素索引值，也可以用在字串。
lst.sort() 或 sorted(lst)：前者是對原列表做排序，後者是返回一個排序後的列表。
reverse：反轉列表。

以下為簡單範例，把檔案中的單字逐一送入變成列表：

fh = open('romeo.txt')
lst = list()
for line in fh:
    subLst = line.strip().split()
    for i in range(len(subLst)):
        lst.append(subLst[i])
lst = list(set(lst))
lst.sort()
print(lst)

字典 (dictionary)

用 JavaScript 的方式講，就是物件，同樣可以用 { } 宣告或是用 dict()。

字典有趣的地方是每個元素都由鍵 (key) 與值 (value) 組成，這可以更靈活地去操縱比較複雜的資料。
可以用 dic["key"] 的方式去取出存在某個 key 下面的值來使用，也可以用這個方法來輕鬆新增或修改。接著下面為常用的字典方法。

keys：返回所有鍵。
values：返回所有值。
items：一個一個返回鍵值對。
clear：清空字典。
get：獲取指定 key 的值，如果沒有這個 key 就給他默認值 (詳見範例)。以下為一個簡單的範例，用來找出發信最多的信箱以及其次數：

fh = open('mbox-short.txt')
dic = dict()
maxNum = 0
maxEmail = ''
for line in fh:
    if not line.startswith('From'):
        continue
    elif line.startswith('From:'):
        continue
    else:
        words = line.strip().split()
        dic[words[1]] = dic.get(words[1], 0) + 1
for i in dic:
    if dic[i] > maxNum:
        maxNum = dic[i]
        maxEmail = i
print(maxEmail, maxNum)

tip

set

python 中有一種資料格式跟字典很像，同樣是以定義，稱為 set。

fruit_set = {'apple', 'banana', 'orange'}

# {'banana', 'orange', 'apple'}
print(fruit_set)

set 跟 dictionary 的差異在於 set 是一個無序且唯一的集合，其中的元素不能重複 (這點跟 dictionary 每個 key 必須為唯一一樣)，它的存在主要用於快速資料查找。而且 set 中的元素只能通過迭代來訪問，因為它沒有索引，所以無法使用如 fruit_set[0] 之類的方式取值。

元組 (tuples)

元組最大的特性就是不可修改，這種只能讀而無法修改的特性可以儲存那些怕更動到的資料。字典中的 items 方法回傳的 key-value pair 就是元組。

tuple
tuple_example = (1, 2, 3, 4, 5)

以下為一個簡單的範例，這裡在最後用 items 把字典拆成元組並透過 sorted 進行排序：

fh = open('mbox-short.txt')
dic = dict()
for line in fh:
    if not line.startswith('From'):
        continue
    elif line.startswith('From:'):
        continue
    else:
        colonIndex = line.index(':')
        hr = line[colonIndex - 2 : colonIndex]
        dic[hr] = dic.get(hr, 0) + 1
ls = sorted(dic.items())
for k, v  in ls:
    print(k, v)

Comprehension (推導式)

基本語法是 [expression for item in iterable if condition] (以 List comprehension 為例)：

expression：可以是運算式，也可以是 item 本身。
item：代表了 iterable 中的當前元素，在 expression 中被引用。
iterable：一個可以進行迭代的對象。
condition：當 condition 為真（True）時，相應的 item 才會被處理。

Example
doubled_evens = [i * 2 for i in range(10) if i % 2 == 0]

# [0, 4, 8, 12, 16]
print(doubled_evens)

List comprehension

Without comprehension
doubled_numbers = []
for i in range(10):
    doubled_numbers.append(i * 2)

List comprehension
doubled_numbers = [i * 2 for i in range(10)]

# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
print(doubled_numbers)

Dictionary comprehension

Without comprehension
squares_dict = {}
for i in range(5):
    squares_dict[i] = i**2

List comprehension
squares_dict = {i: i**2 for i in range(5)}

# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
print(squares_dict)

Set comprehension

Without comprehension
import math
sqrt_set = set()
for i in range(10):
    sqrt_set.add(int(math.sqrt(i)))

List comprehension
import math
sqrt_set = {int(math.sqrt(i)) for i in range(10)}

# {0, 1, 2, 3}
print(sqrt_set)

Generator Expression

類似於列表推導式的表達式，但它返回的是一個生成器（generator）對象而不是列表。
他的特點主要有兩項：

惰性求值（Lazy Evaluation）：生成器表達式並不立即計算每個元素的值，它們只在迭代時才計算。這表示它可以用於非常大的數據。
節省內存：相比列表推導式，生成器表達式不會創建一個完整的列表來存儲所有元素，因此使用的內存更少。(但也因此無法獲得資料長度)

squares_of_evens = (x**2 for x in range(10) if x % 2 == 0)

for square in squares_of_evens:
    print(square)

字串 (String)​

文件 (File)​

列表 (List)​

字典 (dictionary)​

set​

元組 (tuples)​

Comprehension (推導式)​

List comprehension​

Dictionary comprehension​

Set comprehension​

Generator Expression​

參考資料​