attrs强大的类装饰器

Python第三方库

发布日期: 2018-08-21

更新日期: 2020-12-09

文章字数: 2.4k

阅读时长: 10 分

阅读次数:

这里搬运知乎大佬“董伟明”的博客，如有侵权，立即删除。

第三方扩展库之attrs

1. 编程痛点

写多了 Python 代码，尤其是开发和维护的项目比较大的时候，你可能和我一样感觉写 Python 的类很累。怎么累呢？举个例子，现在有个商品类， __init__ 是这么写的。

class Product(object):
    def __init__(self, id, author_id, category_id, brand_id, spu_id,
                 title, item_id, n_comments, creation_time,
                 update_time, source='', parent_id=0, ancestor_id=0):
        self.id = id
        self.author_id = author_id
        self.category_id = category_id
        ...

痛点 1：初始化参数

上面的商品类代码，最大的特点就是初始化参数很多，每一个都需要 self.xx = xx 这样往实例上赋值。我印象见过一个类有 30 多个参数，这个 __init__ 方法下光是赋值就占了一屏多。

痛点 2：打印类对象

上面的商品类代码，如果不定义 __repr__ 方法，打印类对象的方式很不友好，只有一个内存地址可以限制，如下所示。

In : p
Out: <test.Product at 0x10ba6a320>

定义时参数太多，一般按需要挑几个露出来。这样再看类对象就友好了，但是每个类都需要手动的去写 __repr__ 方法。

def __repr__(self):
    return '{}(id={}, author_id={}, category_id={}, brand_id={})'.format(
        self.__class__.__name__, self.id, self.author_id, self.category_id,
        self.brand_id)

In : p
Out: Product(id=1, author_id=100001, category_id=2003, brand_id=20)

痛点 3：对象比较

对象比较，有时候需要判断 2 个对象是否相等甚至大小。虽然可以使用内置模块 functools 中的 total_ordering，但是还是需要至少定义两个比较的魔术方法。

def __eq__(self, other):
    if not isinstance(other, self.__class__):
        return NotImplemented
    return (self.id, self.author_id, self.category_id, self.brand_id) == (
        other.id, other.author_id, other.category_id, other.brand_id)

def __lt__(self, other):
    if not isinstance(other, self.__class__):
        return NotImplemented
    return (self.id, self.author_id, self.category_id, self.brand_id) < (
        other.id, other.author_id, other.category_id, other.brand_id)

痛点 4：对象去重

有些场景下，我们希望对对象进行去重，可以添加 __hash__ 方法。

def __hash__(self):
    return hash((self.id, self.author_id, self.category_id, self.brand_id))

这样就不需要对一大堆的对象挨个比较去重了，直接用集合就可以了。但是，我们可以看到集合只返回了一个对象，另外一个被去掉了。

In : p1 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品1', 2000001, 100, None, 1)

In : p2 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品2', 2000001, 100, None, 2)

In : {p1, p2}
Out: {Product(id=1, author_id=100001, category_id=2003, brand_id=20)}

痛点 5：属性导出

我很喜欢给类写一个 to_dict、 to_json或者 as_dict 这样的方法，把类里面的属性打包成一个字典返回。基本上，都是每个类都要写一遍它。

def to_dict(self):
    return {
        'id': self.id,
        'author_id': self.author_id,
        'category_id': self.category_id,
        ...
    }

当然没有特殊的理由，可以直接使用 vars(self) 获得，上面这种键值对指定的方式会更精准，只导出想导出的部分。如下例中，会把_a包含在返回的结果中，然而它并不应该被导出，所以不适合vars函数。

def to_dict(self):
    self._a = 1
    return vars(self)

到这里，我们停下来想想， self.id、 self.author_id、self.category_id 这些分别写了几次？那有没有一种方法，可以在创建类的时候自动给类加上这些东西，把开发者解脱出来呢？这就是我们今天介绍的attrs和Python 3.7标准库里面将要加的dataclasses模块做的事情，而且它们能做的会更多。

2. 解决方法

[1] attrs 模块

attrs 是 Python 核心开发 Hynek Schlawack 设计并实现的一个项目，它就是解决上述痛点而生的。上述类，使用 attrs 这样写。

import attr

@attr.s(hash=True)
class Product(object):
    id = attr.ib()
    author_id = attr.ib()
    brand_id = attr.ib()
    spu_id = attr.ib()
    title = attr.ib(repr=False, cmp=False, hash=False)
    item_id = attr.ib(repr=False, cmp=False, hash=False)
    n_comments = attr.ib(repr=False, cmp=False, hash=False)
    creation_time = attr.ib(repr=False, cmp=False, hash=False)
    update_time = attr.ib(repr=False, cmp=False, hash=False)
    source = attr.ib(default='', repr=False, cmp=False, hash=False)
    parent_id = attr.ib(default=0, repr=False, cmp=False, hash=False)
    ancestor_id = attr.ib(default=0, repr=False, cmp=False, hash=False)

这就可以了，上面说的那些魔术方法就不需要再写了。

In : p1 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品1', 2000001, 100, None, 1)

In : p2 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品2', 2000001, 100, None, 2)

In : p3 = Product(3, 100001, 2003, 20, 1002393002, '这是一个测试商品3', 2000001, 100, None, 3)

In : p1
Out: Product(id=1, author_id=100001, brand_id=2003, spu_id=20)

In : p1 == p2
Out: True

In : p1 > p3
Out: False

In : {p1, p2, p3}
Out:
{Product(id=1, author_id=100001, brand_id=2003, spu_id=20),
 Product(id=3, author_id=100001, brand_id=2003, spu_id=20)}

In : attr.asdict(p1)
Out:
{'ancestor_id': 0,
 'author_id': 100001,
 'brand_id': 2003,
 'creation_time': 100,
 'id': 1,
 'item_id': '这是一个测试商品1',
 'n_comments': 2000001,
 'parent_id': 0,
 'source': 1,
 'spu_id': 20,
 'title': 1002393002,
 'update_time': None}

In : attr.asdict(p1, filter=lambda a, v: a.name in ('id', 'title', 'author_id'))
Out: {'author_id': 100001, 'id': 1, 'title': 1002393002}

当然，我这个例子中对属性的要求比较多，所以不同属性的参数比较长。看这个类的定义的方式是不是有点像 ORM 呢？对象和属性的关系直观，不参与类中代码逻辑。

[2] dataclasses 模块

在 Python 3.7 里面会添加一个新的模块 dataclasses ，它基于 PEP 557，Python 3.6 可以通过 pip 下载安装使用.

pip install dataclasses

解决如上痛点，把Product类改成这样。

from datetime import datetime
from dataclasses import dataclass, field


@dataclass(hash=True, order=True)
class Product(object):
    id: int
    author_id: int
    brand_id: int
    spu_id: int
    title: str = field(hash=False, repr=False, compare=False)
    item_id: int = field(hash=False, repr=False, compare=False)
    n_comments: int = field(hash=False, repr=False, compare=False)
    creation_time: datetime = field(default=None, repr=False, compare=False,hash=False)
    update_time: datetime = field(default=None, repr=False, compare=False, hash=False)
    source: str = field(default='', repr=False, compare=False, hash=False)
    parent_id: int = field(default=0, repr=False, compare=False, hash=False)
    ancestor_id: int = field(default=0, repr=False, compare=False, hash=False)

先验证一下，是正常的。其中 dataclasses.asdict 不能过滤返回属性。但是总体满足需求。但是，你有没有发现什么不对？

In : p1 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品1', 2000001, 100, None, 1)

In : p2 = Product(1, 100001, 2003, 20, 1002393002, '这是一个测试商品2', 2000001, 100, None, 2)

In : p3 = Product(3, 100001, 2003, 20, 1002393002, '这是一个测试商品3', 2000001, 100, None, 3)

In : p1
Out: Product(id=1, author_id=100001, brand_id=2003, spu_id=20)

In : p1 == p2
Out: True

In : p1 > p3
Out: False

In : {p1, p2, p3}
Out:
{Product(id=1, author_id=100001, brand_id=2003, spu_id=20),
 Product(id=3, author_id=100001, brand_id=2003, spu_id=20)}

In : from dataclasses import asdict

In : asdict(p1)
Out:
{'ancestor_id': 1,
 'author_id': 100001,
 'brand_id': 2003,
 'creation_time': '这是一个测试商品1',
 'id': 1,
 'parent_id': None,
 'source': 100,
 'spu_id': 20,
 'title': 1002393002,
 'update_time': 2000001}

两个模块比较

虽然 2 种方案写的代码确实有些差别，但有木有觉得它俩很像？其实 attrs 的诞生远早于 dataclasses， dataclasses更像是在借鉴。dataclasses可以看做是一个 强制类型注解，功能是 attrs 的子集。那么为什么不把 attrs 放入标准库，而是 Python 3.7 加入一个阉割版的 attrs 呢？

Glyph Lefkowitz 犀利的写了标题为 why not just attrs? 的 issue，我打开这个issue没往下看的时候，猜测是「由于 attrs 兼容 Python3.6，包含 Python2.7 的版本，进入标准库必然是一次卸掉包袱的重构，attrs 作者不同意往这个方向发展？」，翻了下讨论发现不是这样的。

这个 issue 很有意思，多个 Python 开发都参与进来了，最后 Gvanrossum 结束了讨论，明确表达不同意 attrs 进入标准库，Donald Stufft 也直接问了为啥？Gvanrossum 虽然解释了下，但是我还是觉得这算是「仁慈的独裁者」中的「独裁」部分的体现吧，Python 社区的态度一直是不太开放。包含在 PEP 557 下解释为什么不用 attrs，也完全说服不了我。

我站 attrs，向大家推荐！不仅是由于attrs兼容之前的Python版本，而是 attrs 是真的站在开发者的角度上添加功能支持，最后相信 attrs 会走的更远。

3. 高级特性

除此之外，attrs 还支持多种高级用法，如字段类型验证、自动类型转化、属性值不可变、类型注解等等，我列了三个我觉得非常有用。

[1] 字段类型验证

业务代码中经验会对对象属性的类型和内容验证，attrs提供了两种验证支持。

# 1.装饰器

>>> @attr.s
... class C(object):
...     x = attr.ib()
...     @x.validator
...     def check(self, attribute, value):
...         if value > 42:
...             raise ValueError("x must be smaller or equal to 42")

>>> C(42)
C(x=42)
>>> C(43)
Traceback (most recent call last):
   ...
ValueError: x must be smaller or equal to 42

# 2.属性参数

>>> def x_smaller_than_y(instance, attribute, value):
...     if value >= instance.y:
...         raise ValueError("'x' has to be smaller than 'y'!")

>>> @attr.s
... class C(object):
...     x = attr.ib(validator=[attr.validators.instance_of(int),
...                            x_smaller_than_y])
...     y = attr.ib()

>>> C(x=3, y=4)
C(x=3, y=4)
>>> C(x=4, y=3)
Traceback (most recent call last):
   ...
ValueError: 'x' has to be smaller than 'y'!

[2] 属性类型转化

Python 不会检查传入的值的类型，类型错误很容易发生，attrs 支持自动的类型转化。

>>> @attr.s
... class C(object):
...     x = attr.ib(converter=int)

>>> o = C("1")
>>> o.x
1

[3] 包含元数据

属性还可以包含元数据，这个真的非常有用，这个属性的值就不仅仅是一个值了，带上元数据的值非常灵活也更有意义，这样就不需要额外的把属性需要的元数据独立存储起来了。

>>> @attr.s
... class C(object):
...    x = attr.ib(metadata={'my_metadata': 1})

>>> attr.fields(C).x.metadata
mappingproxy({'my_metadata': 1})
>>> attr.fields(C).x.metadata['my_metadata']
1