Python字典和集合

Python进阶指南

发布日期: 2018-08-03

更新日期: 2020-12-09

文章字数: 2.1k

阅读时长: 9 分

阅读次数:

纸上得来终觉浅，绝知此事要躬行。

Python字典和集合

在入门的博客文章中，我们已经知道了关于字典和集合的基本使用方法，而这里将介绍一些实际工作中常常用到的高阶技巧。灵活的使用这些方法，能够很大程度的简化我们的代码以及有良好的可读性。

1. 字典

Python字典和集合

1.1 formkeys

使用formkeys这个方法，可以创建一个包含了自定义键的字典，默认情况下值为None，即不指定键值的情况下。常用于初始化一下键值对，之后按照需求通过对应方法更新键值对。

不指定键值

In [1]: dict.fromkeys(['a', 'b', 'c'])
Out[1]: {'a': None, 'b': None, 'c': None}

指定对应的键值

In [2]: dict.fromkeys(['a', 'b', 'c'], 0)
Out[2]: {'a': 0, 'b': 0, 'c': 0}

1.2 merge

字典merge，即字典合并。

在Python2中合并字典还是比较麻烦的，需要通过字典的update方法合并，但这样会改变一个字典的数据。

In [1]: x = {'a':1, 'b': 2}

In [2]: y = {'b':10, 'c': 11}

In [3]: x.update(y)

In [4]: x
Out[4]: {'a': 1, 'b': 10, 'c': 11}

如果在Python2中，不更改的话，需要使用新的变量和浅拷贝，或是使用字典之间的加法运算，还是不够方便。

In [5]: z = x.copy()

In [6]: z.update(y)

In [7]: z
Out[7]: {'a': 1, 'b': 10, 'c': 11}

# Python2可用，但是Python3不可用
In [8]: dict(x.items() + y.items())
Out[8]: {'a': 1, 'b': 10, 'c': 11}

当然在Python3中也有对应的用法，但是需要使用|表示求交集的意思，而顺序也是不一样的。且不能使用list类型，有局限性，所以最好使用下面解释的这种方式。

# Python3可用，但是Python2不可用
In [9]: dict(x.items() | y.items())
Out[9]: {'c': 11, 'a': 1, 'b': 10}

In[10]: dict({'a': []}.items() | {'b': []}.items())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-deaa266c5c6d> in <module>()
----> 1 dict({'a': []}.items() | {'b': []}.items())
TypeError: unhashable type: 'list'

而在Python3中，最优雅且最高效的就是如下方式。

In[11]: z = {**x, **y}

In [12]: z
Out[12]: {'a': 1, 'b': 10, 'c': 11}

1.3 setdefault

setdefault方法会取对应键的值，如果键不存在，则会添加这个键和对应的默认值。好处在于，设置了默认值让代码的健壮性更好。

看一个简单的示例。

In [1]: d = {}

In [2]: d.setdefault('a', 1)
Out[2]: 1

In [3]: d
Out[3]: {'a': 1}

再看一个统计字符个数的示例。其中，普通的写法大家都会想到而且也很简单，但是更为Pythonic的方法会更优雅而且高效。


In [4]: d = {}

In [5]: for c in s:
   ...:     if c ind:
   ...:         d[c] += 1
   ...:     else:
   ...:         d[c] = 1
   ...:

In [6]: d
Out[6]: {'s': 3, 'd': 2, 'e': 1}

In [7]: s = 'sdsdes'

In [8]: for c in s:
   ...:     d[c] = d.setdefault(c, 0) + 1
   ...:

In [9]: d
Out[9]: {'s': 3, 'd': 2, 'e': 1}

1.4 defaultdict

defaultdict是内置标准库collections中提供的方法。

上面我们使用setdefault方法，好用且够用，但是标准库中提供的defaultdict也是非常有用的。这里，我们还是使用defaultdict方法来实现上述的那个计数的例子。
注意其中defaultdict方法中的int表示对应字典d的键值都为整数了类型，且默认值为0，所以没有对应键时其值为0。

In [1]: import collections

In [2]: s = 'sdsdes'

In [3]: d = collections.defaultdict(int)

In [4]: for c in s:
   ...:     d[c] += 1
   ...:

相对于setdefault方法，defaultdict更加好用且灵活。如下例中，还可以使用lambda表达式，表示对其每个键的值都设置为0。通过这种自定义函数的形式，可以让程序更加灵活。
可以看到，我们使用get方法获取对应b的值并没有返回0，而使用中括号的这种方式拿了之后再使用get方法获取就有了，这是为什么呢？这里就需要提及__missing__魔法方法了。

In [5]: d
Out[5]: defaultdict(int, {'s': 3, 'd': 2, 'e': 1})

In [6]: d = collections.defaultdict(lambda: 0)

In [7]: d.get('b')

In [8]: d['b']
Out[8]: 0

In [9]: d.get('b')
Out[9]: 0

我们使用中括号的这种方式可以获取到字典d的值，是因为其内部定义了__getitem__方法，其内部调用了__missing__方法。
代码的内部逻辑为，当我们使用中括号获取时，发现并没有这个键并执行__getitem__方法。其内部判断字典是否存在对应的键，如果存在就返回值，如果不存在则通过hasattr方法判断其是否有__missing__方法。而其内部的default_factory方法就是defaultdict后面的参数int，然后通过默认的工厂函数default_factory去返回对应的值。而int类型默认值为0，是从0开始计数的。
今后，我们写代码的时候也可以这样使用__missing__方法，做到活学活用的目的。

# 注意: 这里是伪代码，其内部是用C实现的
def __getitem__(self, key):
    if key in self.data:
        return self.data[key]
    if hasattr(self.__class__, "__missing__"):
        return self.__class__.__missing__(self, key)

def __missing__(self, key):
    if self.default_factory is None:
        raise KeyError((key,))
    self[key] = value = self.default_factory()
    return value

1.5 OrderedDict

OrderedDict同样也是标准库collections提供的一个函数。

在Python3.6之前，字典是不会记录插入顺序的，排序结果为键在哈希表中排序，而哈希表会引入随机数来减少冲突。所以，才会引入OrderedDict这个方法来让其变成一个有序的字典。

In [1]: d = {'a': 1, 'b': 2, 'c': 3}

In [2]: for k, v ind.items():
   ...:     print(k, v)
   ...:
('a', 1)
('c', 3)
('b', 2)

In [3]: d = collections.OrderedDict([('a', 1), ('b', 2), ('c', 3)])

In [4]: for k, v ind.items():
...:     print(k, v)
...:
('a', 1)
('b', 2)
('c', 3)

而在Python3.6之后，其内置的字典结构就会记录插入的顺序，就不要再使用OrderedDict方法了。但是，还有如下所示的这个作用需要使用OrderedDict方法。

In [5]: d1 = {'a': 1, 'b': 2, 'c': 3}

In [6]: d2 = {'b': 2, 'c': 3, 'a': 1}

In [7]: d1 == d2
Out[7]: True

In [8]: d1 = collections.OrderedDict([('a', 1), ('b', 2), ('c', 3)])

In [9]: d2 = collections.OrderedDict([('b', 2), ('c', 3), ('a', 1)])

In [10]: d1 == d2
Out[10]: False

2. 集合

Python字典和集合

2.1 集合解析

其实集合也是有解析式的，用{}表示。

基本的使用方式，如下所示，支持循环嵌套和条件判断。需要注意的是，集合解析是去重的且顺序有可能发生改变。

In [1]: {s for s in [1, 2, 1, 0]}
Out[1]: {0, 1, 2}

In [2]: {s for s in [1, 2, 3] if s % 2}
Out[2]: {1, 3}

In [3]: {(m, n) forninrange(2) forminrange(3, 5)}
Out[3]: {(3, 0), (3, 1), (4, 0), (4, 1)}

2.2 哈希

哈希使用魔法方法__hash__表示。

设置字典的键为列表，抛出错误表示列表是不可哈希的。

In [1]: d = {}

In [2]: d[[(1, 2)]] = 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-103-f6e3c19df51b> in <module>()
----> 1 d[[(1, 2)]] = 4
TypeError: unhashable type: 'list'

当我们使用对象作为字典的键，且为其赋值的时候，发现是可以的，但却无法获取到对应的键。因为使用d[A(1)]发现并不是之前的哪个对象。再次赋值，发现有两个对象的键值对了。

In [3]: class A:
   ...:     def __init__(self, a):
   ...:         self.a = a
   ...:

In [4]: d = {}

In [5]: d[A(1)] = 4

In [6]: d
Out[6]: {<__main__.A at 0x108992400>: 4}

In [7]: d[A(1)]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-129-dadaf80862d6> in <module>()
----> 1 d[A(1)]
KeyError: <__main__.A object at 0x1089b9a90>

In [8]: d[A(1)] = 4
In [8]: dOut: {<__main__.A at 0x108992400>: 4, <__main__.A at 0x1089b9860>: 4}

如果只想让一种对象只能赋值一次，怎么操作呢？通过添加__eq__的方法来判断两个类实例是否相等，还有使用__hash__方法让实例可以充当字典的键，不然会报错的。

In [9]: class A:
   ...:     def __init__(self, a, b):
   ...:         self.a = a
   ...:         self.b = b
   ...:
   ...:     @property
   ...:     def key(self):
   ...:         return (self.a, self.b)
   ...:
   ...:     def __eq__(self, other):
   ...:         return self.key == other.key
   ...:
   ...:     def __hash__(self):
   ...:         return hash(self.key)
   ...:

# 实现了字典的去重
In [10]: d = {}

In [11]: d[A(1, 2)] = 4

In [12]: d
Out[12]: {<__main__.A at 0x108992cc0>: 4}

In [13]: d[A(1, 2)] = 4

In [14]: d
Out[14]: {<__main__.A at 0x108992cc0>: 4}

# 实现了集合的去重
In [15]: s = set()

In [16]: s.add(A(1, 2))

In [17]: s
Out[17]: {<__main__.A at 0x1089cacc0>}

In [18]: s.add(A(1, 2))

In [19]: s
Out[19]: {<__main__.A at 0x1089cacc0>}