找回密码
 注nanjixiong2017册

QQ登录

只需一步,快速开始

发表帖子

深度学习(DL):卷积神经网络(CNN):从原理到实现

[复制链接]
本帖最后由 东方熊3 于 2017-12-5 22:32 编辑

深度学习现在大火,虽然自己上过深度学习课程、用过keras做过一些实验,始终觉得理解不透彻。最近仔细学习前辈和学者的著作,感谢他们的无私奉献,整理得到本文,共勉。
1.前言
(1)神经网络的缺陷
神经网络一文中简单介绍了其原理,可以发现不同层之间是全连接的,当神经网络的深度、节点数变大,会导致过拟合、参数过多等问题。
(2)计算机视觉(图像)背景
  • 通过抽取只依赖图像里小的子区域的局部特征,然后利用这些特征的信息就可以融合到后续处理阶段中,从而检测更高级的特征,最后产生图像整体的信息。
  • 距离较近的像素的相关性要远大于距离较远像素的相关性。
  • 对于图像的一个区域有用的局部特征可能对于图像的其他区域也有用,例如感兴趣的物体发生平移的情形。
2.卷积神经网络(CNN)特性
根据前言中的两方面,这里介绍卷积神经网络的两个特性。


(1)局部感知
图1:全连接网络。如果L1层有1000×1000像素的图像,L2层有1000,000个隐层神经元,每个隐层神经元都连接L1层图像的每一个像素点,就有1000x1000x1000,000=10^12个连接,也就是10^12个权值参数。
图2:局部连接网络。L2层每一个节点与L1层节点同位置附近10×10的窗口相连接,则1百万个隐层神经元就只有100w乘以100,即10^8个参数。其权值连接个数比原来减少了四个数量级。
(2)权值共享
就图2来说,权值共享,不是说,所有的红色线标注的连接权值相同。而是说,每一个颜色的线都有一个红色线的权值与之相等,所以第二层的每个节点,其从上一层进行卷积的参数都是相同的。
图2中隐层的每一个神经元都连接10×10个图像区域,也就是说每一个神经元存在10×10=100个连接权值参数。如果我们每个神经元这100个参数是相同的?也就是说每个神经元用的是同一个卷积核去卷积图像。这样L1层我们就只有100个参数。但是这样,只提取了图像一种特征?如果需要提取不同的特征,就加多几种卷积核。所以假设我们加到100种卷积核,也就是1万个参数。
每种卷积核的参数不一样,表示它提出输入图像的不同特征(不同的边缘)。这样每种卷积核去卷积图像就得到对图像的不同特征的放映,我们称之为Feature Map,也就是特征图。
3.网络结构
以LeCun的LeNet-5为例,不包含输入,LeNet-5共有7层,每层都包含连接权值(可训练参数)。输入图像为32*32大小。我们先要明确一点:每个层有多个特征图,每个特征图通过一种卷积滤波器提取输入的一种特征,然后每个特征图有多个神经元。
C1、C3、C5是卷积层,S2、S4、S6是下采样层。利用图像局部相关性的原理,对图像进行下抽样,可以减少数据处理量同时保留有用信息。


图3
4.前向传播
神经网络一文中已经详细介绍过全连接和激励层的前向传播过程,这里主要介绍卷积层、下采样(池化)层。
(1)卷积层
如图4所示,输入图片是一个5×5的图片,用一个3×3的卷积核对该图片进行卷积操作。本质上是一个点积操作。举例:1×1+0×1+1×1+0×0+1×1+0×1+1×0+0×0+1×1=4


图4

  1. <div style="text-align: left;"><span style="line-height: 1.5;">def conv2(X, k):</span></div><div style="text-align: left;">
  2. </div><div style="text-align: left;"><span style="line-height: 1.5;">x_row, x_col = X.shape</span></div><div style="text-align: left;"><span style="line-height: 1.5;">k_row, k_col = k.shape</span></div><div style="text-align: left;"><span style="line-height: 1.5;">ret_row, ret_col = x_row - k_row + </span></div><div style="text-align: left;"><span style="line-height: 1.5;">1, x_col - k_col + 1</span></div><div style="text-align: left;"><span style="line-height: 1.5;">ret = np.empty((ret_row, ret_col))</span></div><div style="text-align: left;"><span style="line-height: 1.5;">for y in range(ret_row):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">for x in range(ret_col):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">sub = X[y : y + k_row, x : x + k_col]</span></div><div style="text-align: left;"><span style="line-height: 1.5;">ret[y,x] = np.sum(sub * k)</span></div><div style="text-align: left;"><span style="line-height: 1.5;">return ret</span></div><div style="text-align: left;">
  3. </div><div style="text-align: left;"><span style="line-height: 1.5;">class ConvLayer:</span></div><div style="text-align: left;"><span style="line-height: 1.5;">def __init__(self, in_channel, out_channel, kernel_size):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.w = np.random.randn(in_channel, out_channel, kernel_size, kernel_size)</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.b = np.zeros((out_channel))</span></div><div style="text-align: left;">
  4. </div><div style="text-align: left;"><span style="line-height: 1.5;">def _relu(self, x):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">x[x < 0] = 0</span></div><div style="text-align: left;">
  5. </div><div style="text-align: left;"><span style="line-height: 1.5;">def forward(self, in_data):</span></div><div style="text-align: left;"><span style="line-height: 1.5;"># assume the first index is channel index</span></div><div style="text-align: left;"><span style="line-height: 1.5;">in_channel, in_row, in_col = in_data.shape</span></div><div style="text-align: left;"><span style="line-height: 1.5;">out_channel, kernel_row, kernel_col = self.w.shape[1], self.w.shape[2], self.w.shape[3]</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.top_val = np.zeros((out_channel, in_row - kernel_row + 1, in_col - kernel_col + 1))</span></div><div style="text-align: left;"><span style="line-height: 1.5;">for j in range(out_channel):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">for i in range(in_channel):</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.top_val[j] += conv2(in_data[i], self.w[i, j])</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.top_val[j] += self.b[j]</span></div><div style="text-align: left;"><span style="line-height: 1.5;">self.top_val[j] = self._relu(self.topval[j])</span></div><div style="text-align: left;"><span style="line-height: 1.5;">return self.top_val</span></div>
复制代码
(2)下采样(池化)层
下采样,即池化,目的是减小特征图,池化规模一般为2×2。常用的池化方法有:
  • 最大池化(Max Pooling)。如图5所示。
  • 均值池化(Mean Pooling)。如图6所示。
  • 高斯池化。借鉴高斯模糊的方法。
  • 可训练池化。训练函数 ff ,接受4个点为输入,输出1个点。


图5


图6
  1. <span class="hljs-class" style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">MaxPoolingLayer</span>:</span><span style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">
  2.     </span><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, kernel_size, name=<span class="hljs-string" style="box-sizing: border-box;">'MaxPool'</span>)</span>:</span>
  3.         self.kernel_size = kernel_size

  4.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  5.         in_batch, in_channel, in_row, in_col = in_data.shape
  6.         k = self.kernel_size
  7.         out_row = in_row / k + (<span class="hljs-number" style="box-sizing: border-box;">1</span> <span class="hljs-keyword" style="box-sizing: border-box;">if</span> in_row % k != <span class="hljs-number" style="box-sizing: border-box;">0</span> <span class="hljs-keyword" style="box-sizing: border-box;">else</span> <span class="hljs-number" style="box-sizing: border-box;">0</span>)
  8.         out_col = in_col / k + (<span class="hljs-number" style="box-sizing: border-box;">1</span> <span class="hljs-keyword" style="box-sizing: border-box;">if</span> in_col % k != <span class="hljs-number" style="box-sizing: border-box;">0</span> <span class="hljs-keyword" style="box-sizing: border-box;">else</span> <span class="hljs-number" style="box-sizing: border-box;">0</span>)

  9.         self.flag = np.zeros_like(in_data)
  10.         ret = np.empty((in_batch, in_channel, out_row, out_col))
  11.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  12.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> c <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  13.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> oy <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_row):
  14.                     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> ox <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_col):
  15.                         height = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (oy + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_row <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_row - oy * k
  16.                         width = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (ox + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_col <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_col - ox * k
  17.                         idx = np.argmax(in_data[b_id, c, oy * k: oy * k + height, ox * k: ox * k + width])
  18.                         offset_r = idx / width
  19.                         offset_c = idx % width
  20.                         self.flag[b_id, c, oy * k + offset_r, ox * k + offset_c] = <span class="hljs-number" style="box-sizing: border-box;">1</span>                        
  21.                         ret[b_id, c, oy, ox] = in_data[b_id, c, oy * k + offset_r, ox * k + offset_c]
  22.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret
复制代码

5.后向传播
神经网络一文中已经详细介绍过全连接和激励层的后向传播过程,这里主要介绍卷积层、下采样(池化)层。
(1)卷积层
当一个卷积层L的下一层(L+1)为采样层,并假设我们已经计算得到了采样层的残差,现在计算该卷积层的残差。从最上面的网络结构图我们知道,采样层(L+1)的map大小是卷积层L的1/(scale*scale),以scale=2为例,但这两层的map个数是一样的,卷积层L的某个map中的4个单元与L+1层对应map的一个单元关联,可以对采样层的残差与一个scale*scale的全1矩阵进行克罗内克积 进行扩充,使得采样层的残差的维度与上一层的输出map的维度一致。
扩展过程:


图7
利用卷积计算卷积层的残差:


图8
  1. <span class="hljs-function" style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> </span><span style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">backward(</span><span class="hljs-keyword" style="box-sizing: border-box;">self</span>, residual)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  2.     in_channel, out_channel, kernel_size = <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.w.shape
  3.     in_batch = residual.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]
  4.     <span class="hljs-comment" style="box-sizing: border-box;"># gradient_b        </span>
  5.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_b = residual.sum(axis=<span class="hljs-number" style="box-sizing: border-box;">3</span>).sum(axis=<span class="hljs-number" style="box-sizing: border-box;">2</span>).sum(axis=<span class="hljs-number" style="box-sizing: border-box;">0</span>) / <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.batch_size
  6.     <span class="hljs-comment" style="box-sizing: border-box;"># gradient_w</span>
  7.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_w = np.zeros_like(<span class="hljs-keyword" style="box-sizing: border-box;">self</span>.w)
  8.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  9.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  10.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> o <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_channel)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  11.                 <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_w[i, o] += conv2(<span class="hljs-keyword" style="box-sizing: border-box;">self</span>.bottom_val[b_id], residual[o])
  12.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_w /= <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.batch_size
  13.     <span class="hljs-comment" style="box-sizing: border-box;"># gradient_x</span>
  14.     gradient_x = np.zeros_like(<span class="hljs-keyword" style="box-sizing: border-box;">self</span>.bottom_val)
  15.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  16.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  17.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> o <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_channel)<span class="hljs-symbol" style="box-sizing: border-box;">:</span>
  18.                 gradient_x[b_id, i] += conv2(padding(residual, kernel_size - <span class="hljs-number" style="box-sizing: border-box;">1</span>), rot18<span class="hljs-number" style="box-sizing: border-box;">0</span>(<span class="hljs-keyword" style="box-sizing: border-box;">self</span>.w[i, o]))
  19.     gradient_x /= <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.batch_size
  20.     <span class="hljs-comment" style="box-sizing: border-box;"># update</span>
  21.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_w = <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_w * <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.momentum - <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_w
  22.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.w += <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.lr * <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_w
  23.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_b = <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_b * <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.momentum - <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.gradient_b
  24.     <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.b += <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.lr * <span class="hljs-keyword" style="box-sizing: border-box;">self</span>.prev_gradient_b
  25.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> gradient_x
复制代码

(2)下采样(池化)层
当某个采样层L的下一层是卷积层(L+1),并假设我们已经计算出L+1层的残差,现在计算L层的残差。采样层到卷积层直接的连接是有权重和偏置参数的,因此不像卷积层到采样层那样简单。现再假设L层第j个map Mj与L+1层的M2j关联,按照BP的原理,L层的残差Dj是L+1层残差D2j的加权和,但是这里的困难在于,我们很难理清M2j的那些单元通过哪些权重与Mj的哪些单元关联,这里需要两个小的变换(rot180°和padding):
rot180°:旋转:表示对矩阵进行180度旋转(可通过行对称交换和列对称交换完成)
  1. <font color="rgba(0, 0, 0, 0)" face="Source Code Pro, monospace"><span style="display: block; box-sizing: border-box; font-size: 12.6px; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">def rot180(in_data):
  2.     </span></font><span class="hljs-keyword" style="box-sizing: border-box;">ret</span> = in_data<span class="hljs-preprocessor" style="box-sizing: border-box;">.copy</span>()
  3.     yEnd = <span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">0</span>] - <span class="hljs-number" style="box-sizing: border-box;">1</span>
  4.     xEnd = <span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">1</span>] - <span class="hljs-number" style="box-sizing: border-box;">1</span>
  5.     for <span class="hljs-built_in" style="box-sizing: border-box;">y</span> <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">0</span>] / <span class="hljs-number" style="box-sizing: border-box;">2</span>):
  6.         for <span class="hljs-built_in" style="box-sizing: border-box;">x</span> <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">1</span>]):
  7.             <span class="hljs-keyword" style="box-sizing: border-box;">ret</span>[yEnd - <span class="hljs-built_in" style="box-sizing: border-box;">y</span>][<span class="hljs-built_in" style="box-sizing: border-box;">x</span>] = <span class="hljs-keyword" style="box-sizing: border-box;">ret</span>[<span class="hljs-built_in" style="box-sizing: border-box;">y</span>][<span class="hljs-built_in" style="box-sizing: border-box;">x</span>]
  8.     for <span class="hljs-built_in" style="box-sizing: border-box;">y</span> <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">0</span>]):
  9.         for <span class="hljs-built_in" style="box-sizing: border-box;">x</span> <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-keyword" style="box-sizing: border-box;">ret</span><span class="hljs-preprocessor" style="box-sizing: border-box;">.shape</span>[<span class="hljs-number" style="box-sizing: border-box;">1</span>] / <span class="hljs-number" style="box-sizing: border-box;">2</span>):
  10.             <span class="hljs-keyword" style="box-sizing: border-box;">ret</span>[<span class="hljs-built_in" style="box-sizing: border-box;">y</span>][xEnd - <span class="hljs-built_in" style="box-sizing: border-box;">x</span>] = <span class="hljs-keyword" style="box-sizing: border-box;">ret</span>[<span class="hljs-built_in" style="box-sizing: border-box;">y</span>][<span class="hljs-built_in" style="box-sizing: border-box;">x</span>]
  11.     return <span class="hljs-keyword" style="box-sizing: border-box;">ret</span>
复制代码


padding:扩充
  1. <span class="hljs-function" style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">padding</span><span class="hljs-params" style="box-sizing: border-box;">(in_data, size)</span>:</span><span style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">
  2.     cur_r, cur_w = in_data.shape[</span><span class="hljs-number" style="box-sizing: border-box;">0</span>], in_data.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>]
  3.     new_r = cur_r + size * <span class="hljs-number" style="box-sizing: border-box;">2</span>
  4.     new_w = cur_w + size * <span class="hljs-number" style="box-sizing: border-box;">2</span>
  5.     ret = np.zeros((new_r, new_w))
  6.     ret[size:cur_r + size, size:cur_w+size] = in_data
  7.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret
复制代码



图9
6.核心代码(demo版)
  1. <span class="hljs-keyword" style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">import</span><span style="display: block; box-sizing: border-box; border-radius: 0px; overflow-x: auto; word-wrap: normal; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"> numpy </span><span class="hljs-keyword" style="box-sizing: border-box;">as</span> np
  2. <span class="hljs-keyword" style="box-sizing: border-box;">import</span> sys

  3. <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">conv2</span><span class="hljs-params" style="box-sizing: border-box;">(X, k)</span>:</span>
  4.     <span class="hljs-comment" style="box-sizing: border-box;"># as a demo code, here we ignore the shape check</span>
  5.     x_row, x_col = X.shape
  6.     k_row, k_col = k.shape
  7.     ret_row, ret_col = x_row - k_row + <span class="hljs-number" style="box-sizing: border-box;">1</span>, x_col - k_col + <span class="hljs-number" style="box-sizing: border-box;">1</span>
  8.     ret = np.empty((ret_row, ret_col))
  9.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> y <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret_row):
  10.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> x <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret_col):
  11.             sub = X[y : y + k_row, x : x + k_col]
  12.             ret[y,x] = np.sum(sub * k)
  13.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret

  14. <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">rot180</span><span class="hljs-params" style="box-sizing: border-box;">(in_data)</span>:</span>
  15.     ret = in_data.copy()
  16.     yEnd = ret.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>] - <span class="hljs-number" style="box-sizing: border-box;">1</span>
  17.     xEnd = ret.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>] - <span class="hljs-number" style="box-sizing: border-box;">1</span>
  18.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> y <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>] / <span class="hljs-number" style="box-sizing: border-box;">2</span>):
  19.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> x <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>]):
  20.             ret[yEnd - y][x] = ret[y][x]
  21.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> y <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]):
  22.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> x <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(ret.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>] / <span class="hljs-number" style="box-sizing: border-box;">2</span>):
  23.             ret[y][xEnd - x] = ret[y][x]
  24.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret

  25. <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">padding</span><span class="hljs-params" style="box-sizing: border-box;">(in_data, size)</span>:</span>
  26.     cur_r, cur_w = in_data.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>], in_data.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>]
  27.     new_r = cur_r + size * <span class="hljs-number" style="box-sizing: border-box;">2</span>
  28.     new_w = cur_w + size * <span class="hljs-number" style="box-sizing: border-box;">2</span>
  29.     ret = np.zeros((new_r, new_w))
  30.     ret[size:cur_r + size, size:cur_w+size] = in_data
  31.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret

  32. <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">discreterize</span><span class="hljs-params" style="box-sizing: border-box;">(in_data, size)</span>:</span>
  33.     num = in_data.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]
  34.     ret = np.zeros((num, size))
  35.     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i, idx <span class="hljs-keyword" style="box-sizing: border-box;">in</span> enumerate(in_data):
  36.         ret[i, idx] = <span class="hljs-number" style="box-sizing: border-box;">1</span>
  37.     <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret

  38. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">ConvLayer</span>:</span>
  39.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_channel, out_channel, kernel_size, lr=<span class="hljs-number" style="box-sizing: border-box;">0.01</span>, momentum=<span class="hljs-number" style="box-sizing: border-box;">0.9</span>, name=<span class="hljs-string" style="box-sizing: border-box;">'Conv'</span>)</span>:</span>
  40.         self.w = np.random.randn(in_channel, out_channel, kernel_size, kernel_size)
  41.         self.b = np.zeros((out_channel))
  42.         self.layer_name = name
  43.         self.lr = lr
  44.         self.momentum = momentum

  45.         self.prev_gradient_w = np.zeros_like(self.w)
  46.         self.prev_gradient_b = np.zeros_like(self.b)
  47.     <span class="hljs-comment" style="box-sizing: border-box;"># def _relu(self, x):</span>
  48.     <span class="hljs-comment" style="box-sizing: border-box;">#     x[x < 0] = 0</span>
  49.     <span class="hljs-comment" style="box-sizing: border-box;">#     return x</span>
  50.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  51.         <span class="hljs-comment" style="box-sizing: border-box;"># assume the first index is channel index</span>
  52.         <span class="hljs-keyword" style="box-sizing: border-box;">print</span> <span class="hljs-string" style="box-sizing: border-box;">'conv forward:'</span> + str(in_data.shape)
  53.         in_batch, in_channel, in_row, in_col = in_data.shape
  54.         out_channel, kernel_size = self.w.shape[<span class="hljs-number" style="box-sizing: border-box;">1</span>], self.w.shape[<span class="hljs-number" style="box-sizing: border-box;">2</span>]
  55.         self.top_val = np.zeros((in_batch, out_channel, in_row - kernel_size + <span class="hljs-number" style="box-sizing: border-box;">1</span>, in_col - kernel_size + <span class="hljs-number" style="box-sizing: border-box;">1</span>))
  56.         self.bottom_val = in_data

  57.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  58.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> o <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_channel):
  59.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  60.                     self.top_val[b_id, o] += conv2(in_data[b_id, i], self.w[i, o])
  61.                 self.top_val[b_id, o] += self.b[o]
  62.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> self.top_val

  63.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, residual)</span>:</span>
  64.         in_channel, out_channel, kernel_size = self.w.shape
  65.         in_batch = residual.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]
  66.         <span class="hljs-comment" style="box-sizing: border-box;"># gradient_b        </span>
  67.         self.gradient_b = residual.sum(axis=<span class="hljs-number" style="box-sizing: border-box;">3</span>).sum(axis=<span class="hljs-number" style="box-sizing: border-box;">2</span>).sum(axis=<span class="hljs-number" style="box-sizing: border-box;">0</span>) / self.batch_size
  68.         <span class="hljs-comment" style="box-sizing: border-box;"># gradient_w</span>
  69.         self.gradient_w = np.zeros_like(self.w)
  70.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  71.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  72.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> o <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_channel):
  73.                     self.gradient_w[i, o] += conv2(self.bottom_val[b_id], residual[o])
  74.         self.gradient_w /= self.batch_size
  75.         <span class="hljs-comment" style="box-sizing: border-box;"># gradient_x</span>
  76.         gradient_x = np.zeros_like(self.bottom_val)
  77.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  78.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  79.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> o <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_channel):
  80.                     gradient_x[b_id, i] += conv2(padding(residual, kernel_size - <span class="hljs-number" style="box-sizing: border-box;">1</span>), rot180(self.w[i, o]))
  81.         gradient_x /= self.batch_size
  82.         <span class="hljs-comment" style="box-sizing: border-box;"># update</span>
  83.         self.prev_gradient_w = self.prev_gradient_w * self.momentum - self.gradient_w
  84.         self.w += self.lr * self.prev_gradient_w
  85.         self.prev_gradient_b = self.prev_gradient_b * self.momentum - self.gradient_b
  86.         self.b += self.lr * self.prev_gradient_b
  87.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> gradient_x

  88. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">FCLayer</span>:</span>
  89.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_num, out_num, lr = <span class="hljs-number" style="box-sizing: border-box;">0.01</span>, momentum=<span class="hljs-number" style="box-sizing: border-box;">0.9</span>)</span>:</span>
  90.         self._in_num = in_num
  91.         self._out_num = out_num
  92.         self.w = np.random.randn(in_num, out_num)
  93.         self.b = np.zeros((out_num, <span class="hljs-number" style="box-sizing: border-box;">1</span>))
  94.         self.lr = lr
  95.         self.momentum = momentum
  96.         self.prev_grad_w = np.zeros_like(self.w)
  97.         self.prev_grad_b = np.zeros_like(self.b)
  98.     <span class="hljs-comment" style="box-sizing: border-box;"># def _sigmoid(self, in_data):</span>
  99.     <span class="hljs-comment" style="box-sizing: border-box;">#     return 1 / (1 + np.exp(-in_data))</span>
  100.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  101.         <span class="hljs-keyword" style="box-sizing: border-box;">print</span> <span class="hljs-string" style="box-sizing: border-box;">'fc forward='</span> + str(in_data.shape)
  102.         self.topVal = np.dot(self.w.T, in_data) + self.b
  103.         self.bottomVal = in_data
  104.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> self.topVal
  105.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, loss)</span>:</span>
  106.         batch_size = loss.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]

  107.         <span class="hljs-comment" style="box-sizing: border-box;"># residual_z = loss * self.topVal * (1 - self.topVal)</span>
  108.         grad_w = np.dot(self.bottomVal, loss.T) / batch_size
  109.         grad_b = np.sum(loss) / batch_size
  110.         residual_x = np.dot(self.w, loss)
  111.         self.prev_grad_w = self.prev_grad_w * momentum - grad_w
  112.         self.prev_grad_b = self.prev_grad_b * momentum - grad_b
  113.         self.w -= self.lr * self.prev_grad_w
  114.         self.b -= self.lr * self.prev_grad_b
  115.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> residual_x

  116. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">ReLULayer</span>:</span>
  117.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, name=<span class="hljs-string" style="box-sizing: border-box;">'ReLU'</span>)</span>:</span>
  118.         <span class="hljs-keyword" style="box-sizing: border-box;">pass</span>

  119.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  120.         self.top_val = in_data
  121.         ret = in_data.copy()
  122.         ret[ret < <span class="hljs-number" style="box-sizing: border-box;">0</span>] = <span class="hljs-number" style="box-sizing: border-box;">0</span>
  123.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret
  124.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, residual)</span>:</span>
  125.         gradient_x = residual.copy()
  126.         gradient_x[self.top_val < <span class="hljs-number" style="box-sizing: border-box;">0</span>] = <span class="hljs-number" style="box-sizing: border-box;">0</span>
  127.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> gradient_x

  128. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">MaxPoolingLayer</span>:</span>
  129.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, kernel_size, name=<span class="hljs-string" style="box-sizing: border-box;">'MaxPool'</span>)</span>:</span>
  130.         self.kernel_size = kernel_size

  131.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  132.         in_batch, in_channel, in_row, in_col = in_data.shape
  133.         k = self.kernel_size
  134.         out_row = in_row / k + (<span class="hljs-number" style="box-sizing: border-box;">1</span> <span class="hljs-keyword" style="box-sizing: border-box;">if</span> in_row % k != <span class="hljs-number" style="box-sizing: border-box;">0</span> <span class="hljs-keyword" style="box-sizing: border-box;">else</span> <span class="hljs-number" style="box-sizing: border-box;">0</span>)
  135.         out_col = in_col / k + (<span class="hljs-number" style="box-sizing: border-box;">1</span> <span class="hljs-keyword" style="box-sizing: border-box;">if</span> in_col % k != <span class="hljs-number" style="box-sizing: border-box;">0</span> <span class="hljs-keyword" style="box-sizing: border-box;">else</span> <span class="hljs-number" style="box-sizing: border-box;">0</span>)

  136.         self.flag = np.zeros_like(in_data)
  137.         ret = np.empty((in_batch, in_channel, out_row, out_col))
  138.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  139.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> c <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  140.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> oy <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_row):
  141.                     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> ox <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_col):
  142.                         height = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (oy + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_row <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_row - oy * k
  143.                         width = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (ox + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_col <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_col - ox * k
  144.                         idx = np.argmax(in_data[b_id, c, oy * k: oy * k + height, ox * k: ox * k + width])
  145.                         offset_r = idx / width
  146.                         offset_c = idx % width
  147.                         self.flag[b_id, c, oy * k + offset_r, ox * k + offset_c] = <span class="hljs-number" style="box-sizing: border-box;">1</span>                        
  148.                         ret[b_id, c, oy, ox] = in_data[b_id, c, oy * k + offset_r, ox * k + offset_c]
  149.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> ret
  150.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, residual)</span>:</span>
  151.         in_batch, in_channel, in_row, in_col = self.flag
  152.         k = self.kernel_size
  153.         out_row, out_col = residual.shape[<span class="hljs-number" style="box-sizing: border-box;">2</span>], residual.shape[<span class="hljs-number" style="box-sizing: border-box;">3</span>]

  154.         gradient_x = np.zeros_like(self.flag)
  155.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> b_id <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_batch):
  156.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> c <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(in_channel):
  157.                 <span class="hljs-keyword" style="box-sizing: border-box;">for</span> oy <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_row):
  158.                     <span class="hljs-keyword" style="box-sizing: border-box;">for</span> ox <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(out_col):
  159.                         height = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (oy + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_row <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_row - oy * k
  160.                         width = k <span class="hljs-keyword" style="box-sizing: border-box;">if</span> (ox + <span class="hljs-number" style="box-sizing: border-box;">1</span>) * k <= in_col <span class="hljs-keyword" style="box-sizing: border-box;">else</span> in_col - ox * k
  161.                         gradient_x[b_id, c, oy * k + offset_r, ox * k + offset_c] = residual[b_id, c, oy, ox]
  162.         gradient_x[self.flag == <span class="hljs-number" style="box-sizing: border-box;">0</span>] = <span class="hljs-number" style="box-sizing: border-box;">0</span>
  163.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> gradient_x

  164. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">FlattenLayer</span>:</span>
  165.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, name=<span class="hljs-string" style="box-sizing: border-box;">'Flatten'</span>)</span>:</span>
  166.         <span class="hljs-keyword" style="box-sizing: border-box;">pass</span>
  167.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  168.         self.in_batch, self.in_channel, self.r, self.c = in_data.shape
  169.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> in_data.reshape(self.in_batch, self.in_channel * self.r * self.c)
  170.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, residual)</span>:</span>
  171.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> residual.reshape(self.in_batch, self.in_channel, self.r, self.c)

  172. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">SoftmaxLayer</span>:</span>
  173.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self, name=<span class="hljs-string" style="box-sizing: border-box;">'Softmax'</span>)</span>:</span>
  174.         <span class="hljs-keyword" style="box-sizing: border-box;">pass</span>
  175.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">forward</span><span class="hljs-params" style="box-sizing: border-box;">(self, in_data)</span>:</span>
  176.         exp_out = np.exp(in_data)
  177.         self.top_val = exp_out / np.sum(exp_out, axis=<span class="hljs-number" style="box-sizing: border-box;">1</span>)
  178.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> self.top_val
  179.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backward</span><span class="hljs-params" style="box-sizing: border-box;">(self, residual)</span>:</span>
  180.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> self.top_val - residual

  181. <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box;">Net</span>:</span>
  182.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="box-sizing: border-box;">(self)</span>:</span>
  183.         self.layers = []
  184.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">addLayer</span><span class="hljs-params" style="box-sizing: border-box;">(self, layer)</span>:</span>
  185.         self.layers.append(layer)
  186.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">train</span><span class="hljs-params" style="box-sizing: border-box;">(self, trainData, trainLabel, validData, validLabel, batch_size, iteration)</span>:</span>
  187.         train_num = trainData.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>]
  188.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> iter <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(iteration):
  189.             <span class="hljs-keyword" style="box-sizing: border-box;">print</span> <span class="hljs-string" style="box-sizing: border-box;">'iter='</span> + str(iter)
  190.             <span class="hljs-keyword" style="box-sizing: border-box;">for</span> batch_iter <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-number" style="box-sizing: border-box;">0</span>, train_num, batch_size):
  191.                 <span class="hljs-keyword" style="box-sizing: border-box;">if</span> batch_iter + batch_size < train_num:
  192.                     self.train_inner(trainData[batch_iter: batch_iter + batch_size],
  193.                         trainLabel[batch_iter: batch_iter + batch_size])
  194.                 <span class="hljs-keyword" style="box-sizing: border-box;">else</span>:
  195.                     self.train_inner(trainData[batch_iter: train_num],
  196.                         trainLabel[batch_iter: train_num])
  197.             <span class="hljs-keyword" style="box-sizing: border-box;">print</span> <span class="hljs-string" style="box-sizing: border-box;">"eval="</span> + str(self.eval(validData, validLabel))
  198.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">train_inner</span><span class="hljs-params" style="box-sizing: border-box;">(self, data, label)</span>:</span>
  199.         lay_num = len(self.layers)
  200.         in_data = data
  201.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(lay_num):
  202.             out_data = self.layers[i].forward(in_data)
  203.             in_data = out_data
  204.         residual_in = label
  205.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(<span class="hljs-number" style="box-sizing: border-box;">0</span>, lay_num, -<span class="hljs-number" style="box-sizing: border-box;">1</span>):
  206.             residual_out = self.layers[i].backward(residual_in)
  207.             residual_in = residual_out
  208.     <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">eval</span><span class="hljs-params" style="box-sizing: border-box;">(self, data, label)</span>:</span>
  209.         lay_num = len(self.layers)
  210.         in_data = data
  211.         <span class="hljs-keyword" style="box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="box-sizing: border-box;">in</span> range(lay_num):
  212.             out_data = self.layers[i].forward(in_data)
  213.             in_data = out_data
  214.         out_idx = np.argmax(in_data, axis=<span class="hljs-number" style="box-sizing: border-box;">1</span>)
  215.         label_idx = np.argmax(label, axis=<span class="hljs-number" style="box-sizing: border-box;">1</span>)
  216.         <span class="hljs-keyword" style="box-sizing: border-box;">return</span> np.sum(out_idx == label_idx) / float(out_idx.shape[<span class="hljs-number" style="box-sizing: border-box;">0</span>])
复制代码



使用道具 举报 回复
您需要登录后才可以回帖 登录 | 注nanjixiong2017册

本版积分规则