Python调用系统命令卡住

Bug解决之道

发布日期: 2021-10-16

更新日期: 2021-10-25

文章字数: 1k

阅读时长: 4 分

阅读次数:

简单记录一下问题的排除过程，和对应的处理方式！

Python调用系统命令卡住

今天在 Twitter 上面看到 laixintao 发了一个小心，说的是发现在 Python 程序下直接调用系统命令时，如果返回信息量过大的话，子进程输出信息量 stdout 会把管道(PIPE)的缓冲器打满，导致主进程卡住。

Python调用系统命令卡住

大概看了下，使用 subprocess 模块的 Popen 调用外部程序，如果 stdout 或 stderr 参数是 pipe，并且程序输出超过操作系统的 pipe size 时，并且使用 Popen.wait() 方式等待程序结束获取返回值，会导致死锁，程序卡在 wait() 调用上。我们可以通过操作系统自带的命令，可以看到，linux 的 pipe 默认值就是 4M，所以超过这个值的话，就可能会出现子进程卡住的问题。像这样的问题，如果真的出现了，确实还是不太好定位原因的(不会有错误消息)。因为，你可以在网上搜了一圈都不知道该怎么解决，重点其实不是解决该问题的方法，而且不知道是这方面的原因。

# ubuntu
$ man 7 pip
PIPE_BUF
    POSIX.1  says  that  write(2)s  of  less than PIPE_BUF bytes must be atomic: the output data is written
    to the pipe as a contiguous sequence.  Writes of more than PIPE_BUF bytes may be nonatomic: the kernel
    may interleave the data with data written by other processes.
    POSIX.1 requires PIPE_BUF to be at least 512 bytes.  (On Linux, PIPE_BUF is 4096 bytes.)  The precise
    semantics depend on whether the file descriptor is nonblocking (O_NONBLOCK), whether there are multiple
    writers to the pipe, and on n, the number of bytes to be writ‐ten:

    O_NONBLOCK disabled, n <= PIPE_BUF
        All n bytes are written atomically; write(2) may block if there is not room for n bytes to be written immediately

    O_NONBLOCK enabled, n <= PIPE_BUF
        If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes;
        otherwise write(2) fails, with errno set to EAGAIN.

    O_NONBLOCK disabled, n > PIPE_BUF
        The write is nonatomic: the data given to write(2) may be interleaved with write(2)s by other process;
        the write(2) blocks until n bytes have been written.

    O_NONBLOCK enabled, n > PIPE_BUF
        If  the  pipe is full, then write(2) fails, with errno set to EAGAIN.  Otherwise, from 1 to n bytes
        may be written (i.e., a "partial write" may occur; the caller should check the return value from write(2)
        to see how many bytes were actually written), and these bytes may be interleaved with writes by other processes.

我们知道在 Python 2.4 添加了子进程模块，可以使用 Popen 来调用系统命令，得到对应命令的正常输出和异常输出。Popen() 其实会执行 fork() 来创建一个子进程，在其中再执行对应命令。

from subprocess import Popen, PIPE

p = Popen(cmd, stdout=PIPE, stderr=PIPE)
(out,err) = p.wait()

但是，这里就有一个问题，其实在 Popen 的文档里面已经有说明了：“注意数据读取是在内存中缓冲的，所以如果数据大小很大或者没有限制，不要使用这种方法”。显然，这个警告实际上意味着：“如果读取的数据有可能超过几页，那么这将使你的代码陷入死锁”。注意，这里是 subprocess.wait 才会卡住。

# Linux默认的pipe-size是64kb
from subprocess import Popen, PIPE

def dd_test(index, dd_size):
    print(f'>>> [{index+1}] the dd sizs is {dd_size} to start...')
    cmd = f'/bin/dd if=/dev/urandom bs=1024 count={dd_size} 2>/dev/null'
    p = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True)
    p.wait()
    print(f'>>> [{index+1}] end the dd test.')

if __name__ == '__main__':
    test_data_list = [63, 64, 65]
    for index, dd_size in enumerate(test_data_list):
        dd_test(index=index, dd_size=dd_size)

# 上述代码的输出信息
➜ python3 dd_test.py
>>> [1] the dd sizs is 63 to start...
>>> [1] end the dd test.
>>> [2] the dd sizs is 64 to start...
>>> [2] end the dd test.
>>> [3] the dd sizs is 65 to start...

其实，这个问题的解决方案非常简单，就是将 stdout 和 stderr 不要设置为使用 PIPE，而是使用适当的文件对象。这些对象将接受合理数量的数据，而不是 4M。
- 使用 communicate() 方法
- 使用 StringIO 库

Python调用系统命令卡住

from subprocess import Popen, PIPE

p = Popen(cmd, stdout=PIPE, stderr=PIPE)
(out,err) = p.communicate()

from subprocess import Popen, PIPE

def run(self, cmd):
    out = StringIO()
    p = Popen(cmd, stderr=PIPE, stdout=PIPE, stdin=PIPE, shell=True)
    p_return_code = p.returncode
    while p_return_code is None:
        data = p.stdout.read()
        out.write(data)
        p.poll()

    if p_return_code != 0:
        print(f'Run `{cmd}` with exit code {p_return_code} ...\n{p.stderr.read()}\n')
        os.sys.exit(p_return_code)
    return out.getvalue()