pytorch tensor 的一些奇妙操作

notes

字数统计: 1.7k阅读时长: 7 min

 2021/01/20   Share

罪魁祸首是CenterNet代码中，在计算loss前进行的gather feature操作，说实话这个操作初见简直蒙圈，我缓了将近三个月才缓明白这个到底在干嘛。

上代码：

def _gather_feat(feat, ind, mask=None):
    dim  = feat.size(2)
    ind  = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
    feat = feat.gather(1, ind)
    if mask is not None:
        mask = mask.unsqueeze(2).expand_as(feat)
        feat = feat[mask]
        feat = feat.view(-1, dim)
    return feat

def _transpose_and_gather_feat(feat, ind):
    feat = feat.permute(0, 2, 3, 1).contiguous()
    feat = feat.view(feat.size(0), -1, feat.size(3))
    feat = _gather_feat(feat, ind)
    return feat

比如计算dep损失前的：

dep_loss += self.crit_reg(output['dep'], batch['reg_mask'],
                                  batch['ind'], batch['dep']) / opt.num_stacks

#crit_reg中第一行就是gather feature

pred = _transpose_and_gather_feat(output, ind)

将网络输出的深度，加上一个dataloader输出一个batch的gt中的reg_mask和ind(最后才理解这个ind指index，说实话之前想过是index但是你放神经网络代码里能明白这到底在干嘛吗，后面细说这个)和gt的深度，简直看蒙了。

要理解这一行就需要把最初gather feature看明白，下面就开始吧

def _transpose_and_gather_feat(feat, ind):
    feat = feat.permute(0, 2, 3, 1).contiguous()
    feat = feat.view(feat.size(0), -1, feat.size(3))
    feat = _gather_feat(feat, ind)
    return feat

ind是一个（B，max_obj）维度张量，在一个B中表示图中某一个点的坐标上是hm的峰值，数值是ct[0]+ct[1]* img.w，图像宽×y + x。后面细说。

1.首先第一句

1	feat = feat.permute(0, 2, 3, 1).contiguous()

将tensor的维度进行变换。将B C H W 转为 B H W C。

举个栗子，torch.size(1,2,3,4)经过转换过后变成 torch.size(1,3,4,2)。

permute参数中每一个代表变换后这个位置的维度数字是变换前的第几个，栗子中将原本第1个位置（从0开始）的[2]放到了第3个位置（也就是最后）

contiguous的作用是让tensor在计算机中存储变成连续的，因为permute变换没有改变物理存储结构。

2.第二句

1	feat = feat.view(feat.size(0), -1, feat.size(3))

view作用也是变换维度，和permute不同，这个改变物理结构，同时可以实现将tensor展开（而permute必须size相同）

参数-1代表自动进行维数匹配。

举个栗子，torch.size(2,3,3,2)进行上面的view后变成torch.size(2,9,2)

3.第三句

def _gather_feat(feat, ind, mask=None):
    dim  = feat.size(2)
    ind  = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
    feat = feat.gather(1, ind)
    if mask is not None:
        mask = mask.unsqueeze(2).expand_as(feat)
        feat = feat[mask]
        feat = feat.view(-1, dim)
    return feat

1	ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)

需要记住的是，ind的维度是[batch, max_obj]是2维的，max_obj是设定的一幅图像当中最多目标的个数，可以自己任意设置。经过前两句的变换，输入feat的维度是B -1 C，（-1待表H×W），是3维的。

肯定要把维数变得一样，那么就unsqueeze(2)，在第3个轴进行增维。

举个栗子：

[[2,0,0,0], [1,3,4,5]] ==> [ [ [2],[0],[0],[0] ], [ [1],[3],[4],[5] ] ] size(2,4,1)

如果unsqueeze(0) ==> [ [ [2,0,0,0], [1,3,4,5 ] ] ] size(1,2,3)

如果unsqueeze(1) ==> [ [ [2,0,0,0] ], [ [1,3,4,5] ] ] size(2,1,4)

这样就变成了3维的。

4.接下来史诗级难点！！！

1	feat = feat.gather(1, ind)

官网给出的解释：

Gathers values along an axis specified by dim.

For a 3-D tensor the output is specified by:

1
2
3

out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

If input is an n-dimensional tensor with size (x_0, x_1…, x_{i-1}, x_i, x_{i+1}, …, x_{n-1})(x0,x1…,x**i−1,x**i,x**i+1,…,x**n−1) and dim = i, then index must be an nn -dimensional tensor with size (x_0, x_1, …, x_{i-1}, y, x_{i+1}, …, x_{n-1})(x0,x1,…,x**i−1,y,x**i+1,…,x**n−1) where y \geq 1y≥1 and out will have the same size as index.

已经晕了是不是，但是要用语言解释的话就是，在input的选的维度上的数值使用同位置的index位置上的数值来代替。

要理解为什么gather就可以将BHWC中关键点用ind提取出来，明确ind中记录的是原先HW上一点x+W×y。再将这个放到gather的功能里面去看，只可意会啊啊啊，理解起来不容易，感觉这样说都是废话，但是目前我也是意会阶段，逐元素推导有点麻烦。

若一个size(1,H,W,C)的input如下size(1,3,3,2)，将C放到一个格子里。

0,1	2,3	4,5
6,7	8,9	10,11
12,13	14,15	16,17

那么设ind为[2，5，6]

进行下面的实验

>>>input = np.empty([2,3,3,2],dtype=np.int64)
>>>for x in np.nditer(input,op_flags=['readwrite']):
...  x[...] = count
...  count +=1
>>> input = torch.from_numpy(input)
>>>input
tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]],

         [[ 6,  7],
          [ 8,  9],
          [10, 11]],

         [[12, 13],
          [14, 15],
          [16, 17]]],


        [[[18, 19],
          [20, 21],
          [22, 23]],

         [[24, 25],
          [26, 27],
          [28, 29]],

         [[30, 31],
          [32, 33],
          [34, 35]]]])
>>> ind = np.zeros([2,5],dtype=np.int64)
>>> ind[0][0] = 2
>>> ind[0][1] = 5
>>> ind[0][2] = 6
>>> ind
array([[2, 5, 6, 0, 0],
       [0, 0, 0, 0, 0]])
>>>ind = torch.from_numpy(ind)
>>>input2 = input.view(input.size(0),-1,input.size(3))
>>> input2.shape
torch.Size([2, 9, 2])
>>> ind2  = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), input2.size(2))
>>> ind2
tensor([[[2, 2],
         [5, 5],
         [6, 6],
         [0, 0],
         [0, 0]],

        [[0, 0],
         [0, 0],
         [0, 0],
         [0, 0],
         [0, 0]]])
>>> feat = input2.gather(1,ind2)
>>> feat
tensor([[[ 4,  5],
         [10, 11],
         [12, 13],
         [ 0,  1],
         [ 0,  1]],

        [[18, 19],
         [18, 19],
         [18, 19],
         [18, 19],
         [18, 19]]])

观察feat的第一个Batch，前三个正是表格中第2，5，6个格子！（从0开始）。这样就得到了我们需要的feature的像素点。至于其他的格子，我们使用之前传入的reg_mask，在不是hm峰值的部分为0，峰值处为1，则将feat的其余部分置为了0，在训练时不会计算loss。

到了这里相信你已经意会了，还有啥不明白的做个实验尝试一下应该没问题啦^V^

2023.5.25更新
跑的更新的代码中，突然这一句报错。

1	img2lidars = data['lidar2img'].inverse()

1	RuntimeError: CUDA error: operation not supported when calling `cusparseCreate(handle)`

查了一下原来是4090显卡的问题，网上其他人在3090上就没问题，于是稍微改一下代码。

device = data['lidar2img'][0].device
img2lidars = data['lidar2img'].cpu()
img2lidars = np.linalg.inv(img2lidars)
img2lidars = coords.new_tensor(img2lidars).to(device=device)

这样就跑起来了，因为numpy只能操作cpu的数据，所以这里要转一下，这也是没有办法的事情，谁让目前的cuda11.1+4090使用inverse会报错嘞。

Next Post

Carla的坐标变换
Previous Post

Nuscenes 数据集

CATALOG



缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: true
    tags: true