chapter6-神经网络

chapter6-神经网络（neural networks）

1. 神经网络原理

这节大部分内容参考AndrewNG深度学习教学视频以及黄海广博士在github上开源的吴恩达深度学习笔记,本节知识根据自己使用经验进行基本总结，细致末梢还请看吴恩达老师讲义。

对于neural networks统计学原理，我将从下面伍个部分进行仔细探讨，因为这些部分是神经网络不可或缺的部分。我将通过与logisticRegression原理进行对比介绍，如果不知道或者不清楚的可以看前面所介绍的logistic。最后实现代码，更深一步了解简单的TensorFlow实现。并且分析机器学习和深度学习建模的联系。

损失函数
损失函数（cost function）是优化的目的。如果没有损失函数，也就没有了目标。不管是任何事情，都需要去了解问题的根本。知道问题的缘由，才能进一步进行建模优化。比如是回归还是分类。是有监督，还是无监督。使用什么统计学原理。如何筛选评价函数。这些都需要通过损失函数，与具体背景结合去构建模型。这里不考虑业务背景。现，只通过逻辑回归损失函数去优化，实现对图像的分类。即有监督。所以损失函数为：
$$
L(\theta) = -Y^Tlogh_{\theta}(X) - (E-Y)^T log(E-h_{\theta}(X))
$$
非线性激活函数
为什么要有非线性激活函数？对于深度学习，非线性激活函数是必不可少的。如果没有非线性激活函数，那么深度学习只能解决极小数线性问题。

前面作为基础讨论的最小二乘法，只能够做线性拟合，做深度学习没有加非线性激活函数，进行拟合的结果简直惨不忍睹。对于现大数据时代，人行为规律分布，基本呈现高斯分布，也就是说，解决问题，要先解决重要度高的问题。而分类问题，是占有大壁江山的。当然时序问题也挺重要，将后讨论。

总之，废话连篇，深度学习，如果不加非线性激活函数，不管它的隐藏层有再深，模型再广。他拟合出来的结果依然会很差。因为世界上，出现规律及规范的线性数据，就像少之又少，并且训练的结果仍然是线性函数表示。
计算图（神经网络中的因为结果梯度下降）
对于现在api泛化的年代，TensorFlow2.0 pytorch keras 都有像sklearn一样模型接口，甚至更简单的工具包也有automl，autogluon。但是要想做得快，调得一手好参数，还是必须要从基础开始。计算图算是神经网络的灵魂所在。
1. 什么神经网络
  
  神经网络就是一个图，大致结构如上图所示。数学表达式如下：
  
  其中$\sigma(z^{[i]})$表示非线性函数，。两行表示一个隐藏层。然后进行向量化，步骤如下：
  
  类比于前几章节，向量化的过程，神经网络也一样，公式更加的简单，相比较于非向量化的公式，只能在用for循环遍历计算，向量化的矩阵，计算更加的块。其中矩阵$Z,X $ 水平方向上，对应于不同的训练样本；竖直方向上，对应不同的输入特征.
2. 神经网络的梯度下降算法
  1. forward propagation
    
    正向传播就很简单，只需要从在$z^{[1]} \rightarrow a^{[1]} \rightarrow z^{[2]} \rightarrow a^{[2]} $ 依次计算即可
  2. backward propagation
    
    对于反向传播，我们需要链式求导，去更新参数$W,b$ 进而优化损失函数。
    
    主要公式如下：
    
    其中，𝑛[1]表示隐藏单元个数，𝑛[2]表示输出单元个数，每次优化预测值$Y:y^{(i) },(𝑖 = 1,2,…,𝑚) $,$L$为上述的logistic损失函数
正则化与标准化
需要细聊，数据处理，建模，优化模型，看此machine learning yearning！！！
1. 为什么要正则化
  
  为什么要正则化？为什么要分析模型结果的方差和偏差？模型的泛化性如何？三个问号，目标同一，就是你到结果是否好？如下图所示，我们和直观的看出模型好坏与方差、偏差的关系。
  
  对于神经网络，有不同于机器学习，正则化有：
  - L1与L2
  - dropout（基本理解为随机去掉一些节点）
  - early-stop（训练迭代评价回归线ROC的斜率最低时停止。）
2. 为什么要标准化
  
  神经网络的最大优势，就是在于能够驾驭超大数据量，并且有很好的结果。进而产生训练慢的负效应，因此需要标准化，它能使得模型训练速度更快。
  
  对于神经网络来说，标准化也有许多优化。
  
  根据数据样本角度，由于数据量超大的原因，深度学习常常用batch-size进行控制每次输入的数据，也就产生通过对每次batch-size的数据进行标准化，就产生了几种标准化。有兴趣可以查一下。
3. 什么是梯度爆炸与消失
  
  对于神经网络数学公式，正向传播来说，最终的结果如下：
  
  我们可以从公式中看到，如果$W$过大或者过小，在连乘中，也就是模型迭代过程中，参数会逐渐变极小或变极大的情况。这种情况就被称为神经网络的梯度爆炸与消失。
  
  因此，神经网络对参数初始化有较大的要求，需要根据使用不同数据和激活函数进行优化设置。具体细节参考上面的machine learning yearning！
优化算法
随着数据量的提升，神经网络的优化算法也逐渐提升。并且随着深度学习的流行，很多学者都会研究优化算法，产生许多有用的paper，并且能够适用于企业。

这里我就不仔细探讨优化算法的实现，以及对数据量的适应程度，有兴趣的可以查看论文，进行深刻研究

神经网络-Python实现

1.加载数据

load_data.py
import numpy as np
import matplotlib.pyplot as plt
import h5py


def load_dataset( ):
    train_dataset = h5py.File('./datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset[ "train_set_x" ][ : ])  # your train set features
    train_set_y_orig = np.array(train_dataset[ "train_set_y" ][ : ])  # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset[ "test_set_x" ][ : ])  # your test set features
    test_set_y_orig = np.array(test_dataset[ "test_set_y" ][ : ])  # your test set labels

    classes = np.array(test_dataset[ "list_classes" ][ : ])  # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[ 0 ]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[ 0 ]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

index = 10
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

2.建立函数

#TensorFlow_build_function.py
import random
import numpy as np

def sigmiod(z):
    s = 1.0/ (1 + np.exp(-z))
    return s

def initialize_w_and_b_info(dim):
    # w = np.random.randn()
    # 先初始值为零 简单看数据实现框架，初始为零是非常不好的因为
    # 他使得神经网络训练时没有不对称性，这样导致你无论加多少个隐藏层都是一样的结果
    w = np.zeros((dim,1)) 
    b = 0 # b初始为零，没有问题
    # W1 = np.random.randn(n_h, n_x) * 0.01
    # b1 = np.zeros((n_h, 1))
    # assert (b.shape == (dim, 1))
    assert (w.shape == (dim,1))
    assert (isinstance(b, float) or isinstance(b, int))
    return w, b

#神经网络的正向传播，
def propagate(w, b, X, Y):
    """
       Implement the cost function and its gradient for the propagation explained above

       Arguments:
       w -- weights, a numpy array of size (num_px * num_px * 3, 1)
       b -- bias, a scalar
       X -- data of size (num_px * num_px * 3, number of examples)
       Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

       Return:
       cost -- negative log-likelihood cost for logistic regression
       dw -- gradient of the loss with respect to w, thus same shape as w
       db -- gradient of the loss with respect to b, thus same shape as b

       Tips:
       - Write your code step by step for the propagation. np.log(), np.dot()
       """
    m = X.shape[1]
    A = sigmiod(np.dot(w.T, X) + b)
    cost = -np.sum(Y*np.log(A) + (1- Y)* np.log(1- A))/m
    dw = np.dot(X,(A- Y).T)/m
    db = np.sum((A - Y),axis = 1,keepdims = True)/m

    assert (dw.shape == w.shape)
    assert (db.dtype == float )

    cost  = np.squeeze(cost)
    assert (cost.shape == ())

    grads = {
            'dw' : dw,
            'db' : db
    }

    return grads ,cost

#测试广播 正向，后向传播正确没有
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
# grads, cost = propagate(w, b, X, Y)
# print ("dw = " + str(grads["dw"]))
# print ("db = " + str(grads["db"]))
# print ("cost = " + str(cost))


# 梯度下降，对 w,b进行更新 利用字典存储
def optimize(w, b, X, Y,num_iterations, learning_rate, print_cost = False):
    costs = []

    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)

        dw = grads['dw']
        db = grads['db']

        w = w - learning_rate*dw
        b = b - learning_rate*db

        if i % 100 == 0 :
            costs.append(cost)
            print_cost = True

        if print_cost and i % 100 == 0:
            print('cost after itertation %i : %f' %(i, cost))
    params = {
        'w' : w,
        'b' : b
    }

    grads = {
        'dw' : dw,
        'db' : db
    }

    return params , grads , costs


#测试激活函数是否正确
# params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)
# print ("w = " + str(params["w"]))
# print ("b = " + str(params["b"]))
# print ("dw = " + str(grads["dw"]))
# print ("db = " + str(grads["db"]))

def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape((X.shape[0]), 1)

    A = sigmiod(np.dot(w.T,X) + b)

    for i in range(A.shape[0]):
        Y_prediction = np.around(A)
    assert (Y_prediction.shape == (1, m))

    return Y_prediction

3.建立模型

#TensorFlow_model.py
from TensorFlow_learn_work_package.week1_work.tensorFlow_build_function import *

from TensorFlow_learn_work_package.week1_work.load_data import *
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    w, b = initialize_w_and_b_info(X_train.shape[0])

    parameters, grads, costs = optimize(w, b , X_train, Y_train, num_iterations, learning_rate, print_cost= False)
    w = parameters['w']
    b = parameters['b']

    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    print("model_train_accuracy: {}%".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)* 100)))
    print(("model_text_accuracy: {}%".format((100 - np.mean(np.abs(Y_prediction_test - Y_test)* 100)))))

    d = {
        "costs" : costs,
        "Y_prediction_train" : Y_prediction_train,
        "Y_prediction_test" : Y_prediction_test,
        "learning_rate" : learning_rate,
        'w' : w , 'b' : b,
        'num_iterations' : num_iterations
    }

    return d
#训练模型
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

数据以及源代码

chapter6-神经网络（neural networks）

1. 神经网络原理

这节大部分内容参考AndrewNG深度学习教学视频以及黄海广博士在github上开源的吴恩达深度学习笔记,本节知识根据自己使用经验进行基本总结，细致末梢还请看吴恩达老师讲义。

损失函数

非线性激活函数

计算图（神经网络中的因为结果梯度下降）

什么神经网络

神经网络的梯度下降算法

正则化与标准化

优化算法

神经网络-Python实现

1.加载数据

2.建立函数

3.建立模型