本文总结记录一些衡量压缩算法速度/复杂度的一些指标。
FLOPs
FLOPs全称为floating point operations,意指浮点运算数,理解为计算量,可以用来衡量算法/模型的复杂度。
与FLOPS,flops or flop/s区分开来,它们是floating point operations per second的缩写,意指每秒(最多)浮点运算次数,理解为计算速度,常用来衡量硬件性能。wiki
计算原理示例
分别以卷积层和全连接层为例介绍FLOPs的计算原理 参考:
卷积层
设卷积核输出特征图尺寸为\(C_o \times H_o \times W_o\),输入特征图通道数为\(C_i\),则
对于输出特征图的每一个pixel,需要的乘法次数为\(C_i \times K^2\),加法次数为\(C_i \times K^2 - 1\),共为\(C_i \times K^2 + C_i \times K^2 - 1 = 2 \times C_i \times K^2 - 1\).
因此考虑输出特征图上的所有pixel,总的FLOPs为 \(FLOPs_{conv} = (2 \times C_i \times K^2 - 1) \times H_o \times W_o \times C_o\)
上述计算没有考虑bias,如果考虑bias,则对输出特征图的每一个pixel会增加一次加法,有\(FLOPs= (2 \times C_i \times K^2) \times H_o \times W_o \times C_o\)
全连接层
设全连接层输入个数为\(I\),输出个数为\(O\),则对每一个输出,都有\(I\)次乘法与\(I - 1\)次加法,因此考虑所有输出,总的FLOPs为 \(FLOPs_{fc} = (2 \times I - 1) \times O\)
上述计算同样没有考虑bias,如果考虑bias,则每一个输出都会增加一次加法,有\(FLOPs= (2 \times I ) \times O\)
注:上述计算均没有考虑输入的Batch size维度。
自动计算的工具
sovrasov/flops-counter.pytorch
- Flops counter for convolutional networks in pytorch framework
使用示例 (经微调,以repo提供为准)
1
2
3
4
5
6
7
8
9
import torchvision.models as models
import torch
from ptflops import get_model_complexity_info
net = models.alexnet()
macs, params = get_model_complexity_info(model=net, input_res=(3, 224, 224),
as_strings=True, print_per_layer_stat=True, verbose=True)
print('{:<30} {:<8}'.format('Computational complexity: ', macs))
print('{:<30} {:<8}'.format('Number of parameters: ', params))
将返回各计算单元的计算量及比例,如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
AlexNet(
61.101 M, 100.000% Params, 0.716 GMac, 100.000% MACs,
(features): Sequential(
2.47 M, 4.042% Params, 0.657 GMac, 91.804% MACs,
(0): Conv2d(0.023 M, 0.038% Params, 0.07 GMac, 9.848% MACs, 3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(0.0 M, 0.000% Params, 0.0 GMac, 0.027% MACs, inplace=True)
(2): MaxPool2d(0.0 M, 0.000% Params, 0.0 GMac, 0.027% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
...
(10): Conv2d(0.59 M, 0.966% Params, 0.1 GMac, 13.936% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(0.0 M, 0.000% Params, 0.0 GMac, 0.006% MACs, inplace=True)
(12): MaxPool2d(0.0 M, 0.000% Params, 0.0 GMac, 0.006% MACs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(0.0 M, 0.000% Params, 0.0 GMac, 0.001% MACs, output_size=(6, 6))
(classifier): Sequential(
58.631 M, 95.958% Params, 0.059 GMac, 8.195% MACs,
(0): Dropout(0.0 M, 0.000% Params, 0.0 GMac, 0.000% MACs, p=0.5, inplace=False)
...
(6): Linear(4.097 M, 6.705% Params, 0.004 GMac, 0.573% MACs, in_features=4096, out_features=1000, bias=True)
)
)
Computational complexity: 0.72 GMac
Number of parameters: 61.1 M