目的
工作需要写了一个数据传输工具需要尽可能利用完带宽,同时因为场景存在跨地域传输(东西或南北跨半个中国这种),所以还要准备对抗弱网场景。
至少得先把弱网测试模拟的方法摸通。
工具
tc
工具,tc stands for Traffic Control ,利用 qdisc(Queuing Discipline,tc (8)) 模拟不同网络环境。
qdisc is short for 'queueing discipline' and it is elementary to understanding traffic control. Whenever the kernel needs to send a packet to an interface, it is enqueued to the qdisc configured for that interface. Immediately afterwards, the kernel tries to get as many packets as possible from the qdisc, for giving them to the network adaptor driver.
A simple QDISC is the 'pfifo' one, which does no processing at all and is a pure First In, First Out queue. It does however store traffic when the network interface can't handle it momentarily.
qdisc
分 classful 和 classless 两类。
classful qdisc 会构造一个树状结构,每个 class 有一个唯一的 parent。 一个 class 可以包含多个 children。
模拟网络环境使用的 tc-netem(8) 是 classless qdisc,只能附加在设备根节点上。 一个设备只能附加一个classless qdisc。
给设备附加 classless qdisc 的基本命令是 tc qdisc add dev DEV root QDISC QDISC-PARAMETERS
, 删除 classless qdisc 的基本命令是 tc qdisc del dev DEV root
以网卡名为 ens192
为例,附加 netem
的命令是:
tc qdisc add dev ens192 root netem
附加之后,使用命令查看各网卡的 qdisc 配置。
tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc netem 8002: dev ens192 root refcnt 9 limit 1000
基准
测试正常情况下的连接情况。
iperf3 -c 192.168.2.9
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 60344 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 116 MBytes 964 Mbits/sec
[ 5] 1.01-2.01 sec 113 MBytes 949 Mbits/sec
[ 5] 2.01-3.01 sec 113 MBytes 950 Mbits/sec
[ 5] 3.01-4.00 sec 112 MBytes 949 Mbits/sec
[ 5] 4.00-5.00 sec 114 MBytes 949 Mbits/sec
[ 5] 5.00-6.01 sec 113 MBytes 949 Mbits/sec
[ 5] 6.01-7.01 sec 114 MBytes 949 Mbits/sec
[ 5] 7.01-8.01 sec 112 MBytes 949 Mbits/sec
[ 5] 8.01-9.01 sec 114 MBytes 950 Mbits/sec
[ 5] 9.01-10.00 sec 112 MBytes 949 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 1.11 GBytes 951 Mbits/sec sender
[ 5] 0.00-10.02 sec 1.11 GBytes 949 Mbits/sec receiver
iperf Done.
延迟
manpage 文档的命令概要如下。
# SYNOPSIS
DELAY := delay TIME [ JITTER [ CORRELATION ]]
[ distribution { uniform | normal | pareto | paretonormal }]
使用 tc qdisc change dev DEV root netem delay ...
来修改 qdisc netem 的参数。
实例,为 ens192
网卡添加 10±5ms 的延迟。
tc qdisc change dev ens192 root netem delay 10ms 5ms
tc qdisc show dev ens192
qdisc netem 8002: root refcnt 9 limit 1000 delay 10ms 5ms
在另一台机器上去 ping 这台设置了延迟的机器,可以看到出现了延迟。
ping 192.168.2.9
正在 Ping 192.168.2.9 具有 32 字节的数据:
来自 192.168.2.9 的回复: 字节=32 时间=6ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间=14ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间=7ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间=8ms TTL=64
192.168.2.9 的 Ping 统计信息:
数据包: 已发送 = 4,已接收 = 4,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
最短 = 6ms,最长 = 14ms,平均 = 8ms
测试高延迟场景下的 TCP 带宽利用,配置了 300±50ms 的延迟,然后使用 iperf3 测试。
iperf3 -c 192.168.2.9
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 60280 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 256 KBytes 2.09 Mbits/sec
[ 5] 1.00-2.00 sec 640 KBytes 5.25 Mbits/sec
[ 5] 2.00-3.01 sec 1.75 MBytes 14.6 Mbits/sec
[ 5] 3.01-4.01 sec 3.38 MBytes 28.4 Mbits/sec
[ 5] 4.01-5.01 sec 8.12 MBytes 68.0 Mbits/sec
[ 5] 5.01-6.00 sec 10.0 MBytes 84.5 Mbits/sec
[ 5] 6.00-7.01 sec 11.6 MBytes 96.6 Mbits/sec
[ 5] 7.01-8.01 sec 11.9 MBytes 99.7 Mbits/sec
[ 5] 8.01-9.01 sec 11.6 MBytes 97.9 Mbits/sec
[ 5] 9.01-10.00 sec 11.8 MBytes 98.8 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 71.0 MBytes 59.5 Mbits/sec sender
[ 5] 0.00-10.33 sec 70.0 MBytes 56.8 Mbits/sec receiver
iperf Done.
丢包
manpage 文档命令概要如下。
# SYNOPSIS
LOSS := loss { random PERCENT [ CORRELATION ] |
state p13 [ p31 [ p32 [ p23 [ p14]]]] |
gemodel p [ r [ 1-h [ 1-k ]]] } [ ecn ]
尝试为ens192
网卡附加20%丢包策略。
tc qdisc add dev ens192 root netem loss 20%
tc qdisc show dev
qdisc noqueue 0: dev lo root refcnt 2
qdisc netem 8004: dev ens192 root refcnt 9 limit 1000 loss 20%
qdisc noqueue 0: dev docker0 root refcnt 2
ping测试
ping 192.168.2.9 -t
正在 Ping 192.168.2.9 具有 32 字节的数据:
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
请求超时。
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
请求超时。
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.9 的回复: 字节=32 时间<1ms TTL=64
请求超时。
iperf3 TCP 测试
iperf3 -c 192.168.2.9
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 52329 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 115 MBytes 964 Mbits/sec
[ 5] 1.00-2.01 sec 114 MBytes 949 Mbits/sec
[ 5] 2.01-3.01 sec 113 MBytes 948 Mbits/sec
[ 5] 3.01-4.00 sec 112 MBytes 949 Mbits/sec
[ 5] 4.00-5.00 sec 113 MBytes 949 Mbits/sec
[ 5] 5.00-6.01 sec 114 MBytes 949 Mbits/sec
[ 5] 6.01-7.01 sec 114 MBytes 949 Mbits/sec
[ 5] 7.01-8.01 sec 113 MBytes 949 Mbits/sec
[ 5] 8.01-9.01 sec 114 MBytes 948 Mbits/sec
[ 5] 9.01-10.01 sec 113 MBytes 950 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.01 sec 1.11 GBytes 950 Mbits/sec sender
[ 5] 0.00-10.03 sec 1.11 GBytes 949 Mbits/sec receiver
iperf Done.
查看拥塞控制算法
sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic
拥塞控制算法是 cubic,20%的高丢包场景 TCP 带宽利用率居然还可以保持这么高。
损坏
manpage 文档
corrupt PERCENT [CORRELATION]
同样给 ens192 附加随机 10% 的损坏包
tc qdisc add dev DEV root netem corrupt 1%
tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc netem 8005: dev ens192 root refcnt 9 limit 1000 corrupt 10%
qdisc noqueue 0: dev docker0 root refcnt 2
iperf3 测试
iperf3 -c 192.168.2.9
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 50825 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 116 MBytes 964 Mbits/sec
[ 5] 1.01-2.01 sec 112 MBytes 949 Mbits/sec
[ 5] 2.01-3.01 sec 112 MBytes 937 Mbits/sec
[ 5] 3.01-4.00 sec 113 MBytes 949 Mbits/sec
[ 5] 4.00-5.00 sec 113 MBytes 949 Mbits/sec
[ 5] 5.00-6.01 sec 114 MBytes 949 Mbits/sec
[ 5] 6.01-7.00 sec 113 MBytes 949 Mbits/sec
[ 5] 7.00-8.00 sec 111 MBytes 935 Mbits/sec
[ 5] 8.00-9.01 sec 114 MBytes 949 Mbits/sec
[ 5] 9.01-10.01 sec 113 MBytes 949 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.01 sec 1.10 GBytes 948 Mbits/sec sender
[ 5] 0.00-10.03 sec 1.10 GBytes 946 Mbits/sec receiver
iperf Done.
这个 corrupt 测试有点不符合我的期望,我想要的是正好突破反码求和校验算法的损坏包,实测传输中的 crc32 校验和最终 md5 校验发现损坏。
然而这个 corrupt 实测里感觉体验和丢包差不多。1GByte 文件反复测试了六七次也没有撞到一次突破校验和的情况。
怀疑是 netem corrupt 的实现就不会制造突破校验和的场景。
限速
manpage 文档
rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]
案例,给 ens192 添加 10Mbps 的限速。
tc qdisc add dev ens192 root netem rate 10Mbit
iperf3 测试
iperf3 -c 192.168.2.9 -t 30
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 56559 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 10.1 MBytes 84.7 Mbits/sec
[ 5] 1.00-2.00 sec 8.50 MBytes 71.5 Mbits/sec
[ 5] 2.00-3.00 sec 10.0 MBytes 83.8 Mbits/sec
[ 5] 3.00-4.01 sec 10.4 MBytes 86.0 Mbits/sec
[ 5] 4.01-5.01 sec 10.2 MBytes 86.0 Mbits/sec
[ 5] 5.01-6.01 sec 7.38 MBytes 61.9 Mbits/sec
[ 5] 6.01-7.01 sec 7.50 MBytes 63.1 Mbits/sec
[ 5] 7.01-8.00 sec 7.75 MBytes 65.5 Mbits/sec
[ 5] 8.00-9.01 sec 7.38 MBytes 61.1 Mbits/sec
[ 5] 9.01-10.01 sec 7.62 MBytes 64.2 Mbits/sec
[ 5] 10.01-11.01 sec 7.88 MBytes 66.1 Mbits/sec
[ 5] 11.01-12.01 sec 7.62 MBytes 64.4 Mbits/sec
[ 5] 12.01-13.02 sec 7.75 MBytes 64.4 Mbits/sec
[ 5] 13.02-14.01 sec 7.75 MBytes 65.1 Mbits/sec
[ 5] 14.01-15.01 sec 7.50 MBytes 62.9 Mbits/sec
[ 5] 15.01-16.01 sec 7.62 MBytes 64.4 Mbits/sec
[ 5] 16.01-17.00 sec 7.88 MBytes 66.4 Mbits/sec
[ 5] 17.00-18.01 sec 8.62 MBytes 71.9 Mbits/sec
[ 5] 18.01-19.00 sec 8.38 MBytes 70.8 Mbits/sec
[ 5] 19.00-20.02 sec 8.25 MBytes 68.2 Mbits/sec
[ 5] 20.02-21.01 sec 8.25 MBytes 69.7 Mbits/sec
[ 5] 21.01-22.00 sec 8.38 MBytes 70.6 Mbits/sec
[ 5] 22.00-23.01 sec 8.38 MBytes 69.4 Mbits/sec
[ 5] 23.01-24.00 sec 7.75 MBytes 65.8 Mbits/sec
[ 5] 24.00-25.01 sec 7.62 MBytes 63.8 Mbits/sec
[ 5] 25.01-26.01 sec 7.75 MBytes 65.0 Mbits/sec
[ 5] 26.01-27.01 sec 7.88 MBytes 65.8 Mbits/sec
[ 5] 27.01-28.01 sec 8.12 MBytes 68.3 Mbits/sec
[ 5] 28.01-29.01 sec 7.88 MBytes 66.1 Mbits/sec
[ 5] 29.01-30.00 sec 7.75 MBytes 65.3 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-30.00 sec 246 MBytes 68.8 Mbits/sec sender
[ 5] 0.00-30.00 sec 246 MBytes 68.6 Mbits/sec receiver
iperf Done.
比较烦人的是,rate 限速结果基本没法稳定。 rate 1Mbit 配置实测带宽远高于 1Mbps。
另一个限速相关的配置 limit 暂时没有测试。
把限速拉低到 10kbit 之后 iperf3 测试还出现了 reset 的情况,没有 reset 时的报告如下。
iperf3 -c 192.168.2.9 -t 15
Connecting to host 192.168.2.9, port 5201
[ 5] local 192.168.7.220 port 63087 connected to 192.168.2.9 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 256 KBytes 2.09 Mbits/sec
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 4.00-5.00 sec 128 KBytes 1.05 Mbits/sec
[ 5] 5.00-6.01 sec 0.00 Bytes 0.00 bits/sec
[ 5] 6.01-7.00 sec 128 KBytes 1.06 Mbits/sec
[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 8.00-9.01 sec 0.00 Bytes 0.00 bits/sec
[ 5] 9.01-10.01 sec 128 KBytes 1.05 Mbits/sec
[ 5] 10.01-11.01 sec 0.00 Bytes 0.00 bits/sec
[ 5] 11.01-12.01 sec 0.00 Bytes 0.00 bits/sec
[ 5] 12.01-13.01 sec 128 KBytes 1.05 Mbits/sec
[ 5] 13.01-14.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 14.00-15.01 sec 128 KBytes 1.04 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-15.01 sec 896 KBytes 489 Kbits/sec sender
[ 5] 0.00-15.83 sec 768 KBytes 397 Kbits/sec receiver
iperf Done.
慢io
模拟慢 io 就不能靠 tc netem 了,参考 serverfault 上的一篇文章,使用 device mapper 制造一个慢设备。
所需的工具包括 losetup
、blockdev
、dmsetup
,可以用 dnf provides NAME
查找哪个包提供了对应命令。
在 openEuler 上需要的包分别是 util-linux
和 device-mapper
。
delay=100
devfile=/tmp/slow-disk-100M
dd if=/dev/zero of=$devfile bs=1024k count=100
device=`losetup --show --find /tmp/slow-disk-100M`
# 注意后面的 blockdev 参数是 losetup 输出的
size=`blockdev --getsize $device`
dmsetup create --table "0 $size delay $device 0 $delay"
ls /dev/mapper/dm-slow
/dev/mapper/dm-slow
格式化并挂载这个设备
mkfs.ext4 /dev/mapper/dm-slow
mkdir /mnt/slow
mount /dev/mapper/dm-slow /mnt/slow
mke2fs 1.46.4 (18-Aug-2021)
Discarding device blocks: done
Creating filesystem with 102400 1k blocks and 25584 inodes
Filesystem UUID: a640fab2-49e4-4135-967f-cc486efd7e68
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
实测下写入速度。
dd if=/dev/random of=/mnt/slow/testdata.bin bs=1KB count=1024
1024+0 records in
1024+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 0.00664252 s, 154 MB/s
对比基准速度
dd if=/dev/random of=/tmp/testdata.bin bs=1KB count=1024
1024+0 records in
1024+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 0.00613006 s, 167 MB/s
好我受够了,这个不行。完全达咩。 这个场景放弃。