Python三⼤利器

爱吃窝窝头2023-11-302023-11-30

四、Python三⼤利器

[TOC]

Python中的三⼤利器包括：迭代器，⽣成器，装饰器，利⽤好它们才能开发出最⾼性能的Python程
序，涉及到的内置模块 itertools提供迭代器相关的操作。此部分收录有意思的例⼦共计15例。

1 寻找第n次出现位置

def search_n(s, c, n):
    size = 0
    for i, x in enumerate(s):
        if x == c:
            size += 1
        if size == n:
            return i
    return -1
print(search_n("fdasadfadf", "a", 3))# 结果为7，正确
print(search_n("fdasadfadf", "a", 30))# 结果为-1，正确

2 斐波那契数列前n项

def fibonacci(n):
    a, b = 1, 1
    for _ in range(n):
        yield a
        a, b = b, a + b
list(fibonacci(5)) # [1, 1, 2, 3, 5]

3 找出所有重复元素

from collections import Counter

def find_all_duplicates(lst):
    c = Counter(lst)
    return list(filter(lambda k: c[k] > 1, c))

find_all_duplicates([1, 2, 2, 3, 3, 3]) # [2,3]

4 联合统计次数

Counter对象间可以做数学运算

from collections import Counter

a = ['apple', 'orange', 'computer', 'orange']
b = ['computer', 'orange']

ca = Counter(a)
cb = Counter(b)
#Counter对象间可以做数学运算
ca + cb # Counter({'orange': 3, 'computer': 2, 'apple': 1})
# 进⼀步抽象，实现多个列表内元素的个数统计

def sumc(*c):
    if (len(c) < 1):
        return
    mapc = map(Counter, c)
    s = Counter([])
    for ic in mapc: # ic 是⼀个Counter对象
        s += ic
    return s
#Counter({'orange': 3, 'computer': 3, 'apple': 1, 'abc': 1, 'face': 1})
sumc(a, b, ['abc'], ['face', 'computer'])

5 groupby单字段分组

天⽓记录：

a = [{'date': '2023-08-04', 'weather': 'cloud'},
{'date': '2023-08-03', 'weather': 'sunny'},
{'date': '2023-08-02', 'weather': 'cloud'}]

按照天⽓字段 weather分组汇总：

from itertools import groupby
for k, items in groupby(a,key=lambda x:x['weather']):
    print(k)

输出结果看出，分组失败！原因：分组前必须按照分组字段排序，这个很坑~

cloud
sunny
cloud

修改代码：

a.sort(key=lambda x: x['weather'])
for k, items in groupby(a,key=lambda x:x['weather']):
    print(k)
    for i in items:
        print(i)

输出结果：

cloud
{'date': '2023-08-04', 'weather': 'cloud'}
{'date': '2023-08-06', 'weather': 'cloud'}
sunny
{'date': '2023-08-05', 'weather': 'sunny'}

6 itemgetter和key函数

注意到 sort和 groupby所⽤的 key函数，除了 lambda写法外，还有⼀种简写，就是使⽤
itemgetter：

a = [{'date': '2023-08-04', 'weather': 'cloud'},
{'date': '2023-08-02', 'weather': 'sunny'},
{'date': '2023-08-01', 'weather': 'cloud'}]
from operator import itemgetter
from itertools import groupby
a.sort(key=itemgetter('weather'))
for k, items in groupby(a, key=itemgetter('weather')):
    print(k)
    for i in items:
        print(i)

结果：

cloud
{'date': '2023-08-04', 'weather': 'cloud'}
{'date': '2023-08-06', 'weather': 'cloud'}
sunny
{'date': '2023-08-05', 'weather': 'sunny'}

7 groupby多字段分组

itemgetter是⼀个类， itemgetter(‘weather’)返回⼀个可调⽤的对象，它的参数可有多个：

from operator import itemgetter
from itertools import groupby
a.sort(key=itemgetter('weather', 'date'))
for k, items in groupby(a, key=itemgetter('weather')):
    print(k)
    for i in items:
        print(i)

结果如下，使⽤ weather和 date两个字段排序 a ，

cloud
{'date': '2023-08-01', 'weather': 'cloud'}
{'date': '2023-08-04', 'weather': 'cloud'}
sunny
{'date': '2023-08-02', 'weather': 'sunny'}

注意这个结果与上⾯结果有些微妙不同，这个更多是我们想看到和使⽤更多的。

8 sum函数计算和聚合同时做

Python中的聚合类函数 sum, min, max第⼀个参数是 iterable类型，⼀般使⽤⽅法如下：

a = [4,2,5,1]
sum([i+1 for i in a]) # 16

使⽤列表⽣成式 [i+1 for i in a]创建⼀个长度与 a ⼀⾏的临时列表，这步完成后，再做 sum聚合。

试想如果你的数组 a 长度⼗百万级，再创建⼀个这样的临时列表就很不划算，最好是⼀边算⼀边聚合，
稍改动为如下：

a = [4,2,5,1]
sum(i+1 for i in a) # 16

此时 i+1 for i in a 是 (i+1 for i in a)的简写，得到⼀个⽣成器( generator)对象，如下所⽰：

(i+1 for i in a)

<generator object <genexpr> at 0x0000023BD58884A0>

⽣成器每迭代⼀步吐出( yield)⼀个元素并计算和聚合后，进⼊下⼀次迭代，直到终点。

9 list分组(⽣成器版)

前面第一章python基础中的列表等分就是使用的生成器，如下：

from math import ceil
def divide_iter(lst, n):
    if n <= 0:
        yield lst
        return
    i, div = 0, ceil(len(lst) / n)
    while i < n:
        yield lst[i * div: (i + 1) * div]
        i += 1
list(divide_iter([1, 2, 3, 4, 5], 0)) # [[1, 2, 3, 4, 5]]
list(divide_iter([1, 2, 3, 4, 5], 2)) # [[1, 2, 3], [4, 5]]

10 列表全展开（⽣成器版）

#多层列表展开成单层列表
a=[1,2,[3,4,[5,6],7],8,["python",6],9]
def function(lst):
    for i in lst:
        if type(i)==list:
            yield from function(i)
        else:
            yield i
print(list(function(a))) # [1, 2, 3, 4, 5, 6, 7, 8, 'python', 6, 9]

[1, 2, 3, 4, 5, 6, 7, 8, 'python', 6, 9]

11 测试函数运⾏时间的装饰器

#测试函数执⾏时间的装饰器⽰例
import time
def timing_func(fn):
    def wrapper():
        start=time.time()
        fn() #执⾏传⼊的fn参数
        stop=time.time()
        return (stop-start)
    return wrapper
@timing_func
def test_list_append():
    lst=[]
    for i in range(0,100000):
        lst.append(i)
@timing_func
def test_list_compre():
    [i for i in range(0,100000)] #列表⽣成式
a=test_list_append()
c=test_list_compre()
print("test list append time:",a)
print("test list comprehension time:",c)
print("append/compre:",round(a/c,3))

test list append time: 0.006039142608642578
test list comprehension time: 0.0050089359283447266
append/compre: 1.206

12 统计异常出现次数和时间的装饰器

写⼀个装饰器，统计某个异常重复出现指定次数时，经历的时长。

import time
import math
def excepter(f):
    i = 0
    t1 = time.time()
    def wrapper():
        try:
            f()
        except Exception as e:
            nonlocal i
            i += 1
            print(f'{e.args[0]}: {i}')
            t2 = time.time()
            if i == n:
                print(f'spending time:{round(t2-t1,2)}')
    return wrapper

关键词 nonlocal常⽤于函数嵌套中，声明变量i为⾮局部变量；

如果不声明， i+=1表明 i 为函数 wrapper内的局部变量，因为在 i+=1引⽤(reference)时, i 未被声明，所以会报 unreferenced variable的错误。

使⽤创建的装饰函数 excepter, n 是异常出现的次数。

共测试了两类常见的异常：被零除和数组越界。

n = 10 # 除计数外
@excepter
def divide_zero_except():
    time.sleep(0.1)
    j = 1/(40-20*2)

# 测试零除
for _ in range(n):
    divide_zero_except()
@excepter
def outof_range_except():
    a = [1,3,5]
    time.sleep(0.1)
    print(a[3])
# 测试超出范围除外
for _ in range(n):
    outof_range_except()

division by zero: 1
division by zero: 2
division by zero: 3
division by zero: 4
division by zero: 5
division by zero: 6
division by zero: 7
division by zero: 8
division by zero: 9
division by zero: 10
spending time:1.09
list index out of range: 1
list index out of range: 2
list index out of range: 3
list index out of range: 4
list index out of range: 5
list index out of range: 6
list index out of range: 7
list index out of range: 8
list index out of range: 9
list index out of range: 10
spending time:1.09

打印出来的结果如下：

division by zero: 1
division by zero: 2
division by zero: 3
division by zero: 4
division by zero: 5
division by zero: 6
division by zero: 7
division by zero: 8
division by zero: 9
division by zero: 10
spending time:1.09
list index out of range: 1
list index out of range: 2
list index out of range: 3
list index out of range: 4
list index out of range: 5
list index out of range: 6
list index out of range: 7
list index out of range: 8
list index out of range: 9
list index out of range: 10
spending time:1.09

13 测试运⾏时长的装饰器

#测试函数执⾏时间的装饰器⽰例
import time
def timing(fn):
    def wrapper():
        start=time.time()
        fn() #执⾏传⼊的fn参数
        stop=time.time()
        return (stop-start)
    return wrapper
@timing
def test_list_append():
    lst=[]
    for i in range(0,100000):
        lst.append(i)
@timing
def test_list_compre():
    [i for i in range(0,100000)] #列表⽣成式
    
a=test_list_append()
c=test_list_compre()
print("test list append time:",a)
print("test list comprehension time:",c)
print("append/compre:",round(a/c,3))

test list append time: 0.008554697036743164
test list comprehension time: 0.004532337188720703
append/compre: 1.887

这个实例与测试函数执⾏时长的装饰器是一样的

14 装饰器通俗理解

再看⼀个装饰器：

def call_print(f):
    def g():
        print('you\'re calling %s function'%(f.__name__,))
    return g

使⽤ call_print装饰器：

@call_print
def myfun():
    pass
@call_print
def myfun2():
    pass

myfun()后返回：

myfun()

you're calling myfun function

you're calling myfun function

myfun2()

you're calling myfun2 function

you're calling myfun2 function

使⽤call_print

你看， @call_print放置在任何⼀个新定义的函数上⾯，都会默认输出⼀⾏，你正在调⽤这个函数的名。
这是为什么呢？注意观察新定义的 call_print函数(加上@后便是装饰器):

def call_print(f):
    def g():
        print('you\'re calling %s function'%(f.__name__,))
    return g

它必须接受⼀个函数 f ，然后返回另外⼀个函数 g .

装饰器本质

本质上，它与下⾯的调⽤⽅式效果是等效的：

def myfun():
    pass
def myfun2():
    pass
def call_print(f):
    def g():
        print('you\'re calling %s function'%(f.__name__,))
    return g

下⾯是最重要的代码：

myfun = call_print(myfun)
myfun2 = call_print(myfun2)

⼤家看明⽩吗？也就是call_print(myfun)后不是返回⼀个函数吗，然后再赋值给myfun.
再次调⽤myfun, myfun2时，效果是这样的：

myfun()

you're calling myfun function

you're calling myfun function

myfun2()

you're calling myfun2 function

you're calling myfun2 function

你看，这与装饰器的实现效果是⼀模⼀样的。装饰器的写法可能更加直观些，所以不⽤显⽰的这样赋
值： myfun = call_print(myfun)， myfun2 = call_print(myfun2)，但是装饰器的这种封装，猛
⼀看，有些不好理解。

15 定制递减迭代器

#编写⼀个迭代器，通过循环语句，实现对某个正整数的依次递减1，直到0.
from typing import Iterator

class Descend(Iterator):
    def __init__(self,N):
        self.N=N
        self.a=0
    def __iter__(self):
        return self
    def __next__(self):
        while self.a<self.N:
            self.N-=1
            return self.N
        raise StopIteration
        
descend_iter=Descend(10)
print(list(descend_iter))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

核⼼要点：

1 __next__名字不能变，实现定制的迭代逻辑

2 raise StopIteration：通过 raise 中断程序，必须这样写