神经网络基础理论

神经网络是什么？

神经网络（Neural Network）是一种模拟人脑神经元连接方式的计算模型，通过多层神经元的组合来实现复杂的非线性映射关系。

神经网络的基本组成单元是神经元（Neuron），也称为节点（Node）。每个神经元接收多个输入信号，经过加权求和和激活函数处理后，输出一个信号。

神经元的数学模型

单个神经元的数学表达式为：

$$y = f(\sum_{i=1}^{n} w_i x_i + b)$$

其中：

$x_i$ 是第 $i$ 个输入
$w_i$ 是第 $i$ 个输入对应的权重
$b$ 是偏置项
$f$ 是激活函数
$y$ 是神经元的输出

多层神经网络结构

多层神经网络由输入层、隐藏层和输出层组成：

输入层（Input Layer）：接收外部输入数据
隐藏层（Hidden Layer）：进行特征提取和变换，可以有多层
输出层（Output Layer）：产生最终的预测结果

网络的数学表示

对于一个三层神经网络（输入层-隐藏层-输出层），其前向传播过程可以表示为：

隐藏层计算：
$$z^{(1)} = W^{(1)} x + b^{(1)}$$
$$a^{(1)} = f(z^{(1)})$$

输出层计算：
$$z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}$$
$$y = f(z^{(2)})$$

其中：

$W^{(l)}$ 是第 $l$ 层的权重矩阵
$b^{(l)}$ 是第 $l$ 层的偏置向量
$f$ 是激活函数

激活函数

常用激活函数

1. Sigmoid函数

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

特点：

输出范围：(0, 1)
平滑可导
存在梯度消失问题

2. ReLU函数

$$\text{ReLU}(x) = \max(0, x)$$

特点：

计算简单
缓解梯度消失问题
可能出现神经元死亡

3. Tanh函数

$$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

特点：

输出范围：(-1, 1)
零中心化
仍存在梯度消失问题

Python实现激活函数

import numpy as np
import matplotlib.pyplot as plt

class ActivationFunctions:
    """激活函数类"""
    
    @staticmethod
    def sigmoid(x):
        """Sigmoid激活函数"""
        return 1 / (1 + np.exp(-np.clip(x, -250, 250)))  # 防止溢出
    
    @staticmethod
    def sigmoid_derivative(x):
        """Sigmoid函数的导数"""
        s = ActivationFunctions.sigmoid(x)
        return s * (1 - s)
    
    @staticmethod
    def relu(x):
        """ReLU激活函数"""
        return np.maximum(0, x)
    
    @staticmethod
    def relu_derivative(x):
        """ReLU函数的导数"""
        return (x > 0).astype(float)
    
    @staticmethod
    def tanh(x):
        """Tanh激活函数"""
        return np.tanh(x)
    
    @staticmethod
    def tanh_derivative(x):
        """Tanh函数的导数"""
        return 1 - np.tanh(x) ** 2

# 可视化激活函数
def plot_activation_functions():
    """绘制激活函数图像"""
    x = np.linspace(-5, 5, 100)
    
    plt.figure(figsize=(15, 5))
    
    # Sigmoid
    plt.subplot(1, 3, 1)
    plt.plot(x, ActivationFunctions.sigmoid(x), 'b-', label='Sigmoid')
    plt.plot(x, ActivationFunctions.sigmoid_derivative(x), 'r--', label='Derivative')
    plt.title('Sigmoid Function')
    plt.legend()
    plt.grid(True)
    
    # ReLU
    plt.subplot(1, 3, 2)
    plt.plot(x, ActivationFunctions.relu(x), 'b-', label='ReLU')
    plt.plot(x, ActivationFunctions.relu_derivative(x), 'r--', label='Derivative')
    plt.title('ReLU Function')
    plt.legend()
    plt.grid(True)
    
    # Tanh
    plt.subplot(1, 3, 3)
    plt.plot(x, ActivationFunctions.tanh(x), 'b-', label='Tanh')
    plt.plot(x, ActivationFunctions.tanh_derivative(x), 'r--', label='Derivative')
    plt.title('Tanh Function')
    plt.legend()
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()

前向传播算法

前向传播原理

前向传播（Forward Propagation）是神经网络从输入层到输出层逐层计算的过程。

算法步骤

输入数据：将训练样本输入到网络的输入层
逐层计算：从输入层开始，逐层向前计算每一层的输出
激活函数：对每层的线性组合结果应用激活函数
输出结果：得到网络的最终输出

矩阵运算形式

对于批量数据处理，前向传播可以用矩阵运算表示：

$$Z^{(l)} = A^{(l-1)} W^{(l)} + B^{(l)}$$
$$A^{(l)} = f(Z^{(l)})$$

其中：

$A^{(l)}$ 是第 $l$ 层的激活值矩阵
$Z^{(l)}$ 是第 $l$ 层的线性组合结果
$W^{(l)}$ 是第 $l$ 层的权重矩阵
$B^{(l)}$ 是第 $l$ 层的偏置矩阵

Python实现前向传播

class NeuralNetwork:
    """简单的多层神经网络实现"""
    
    def __init__(self, layers, activation='sigmoid'):
        """
        初始化神经网络
        
        参数:
        layers: 列表，每个元素表示对应层的神经元数量
        activation: 激活函数类型
        """
        self.layers = layers
        self.num_layers = len(layers)
        self.activation = activation
        
        # 初始化权重和偏置
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # Xavier初始化
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def forward_propagation(self, X):
        """
        前向传播
        
        参数:
        X: 输入数据，形状为 (样本数, 特征数)
        
        返回:
        activations: 每层的激活值列表
        z_values: 每层的线性组合结果列表
        """
        activations = [X]  # 存储每层的激活值
        z_values = []      # 存储每层的线性组合结果
        
        current_input = X
        
        for i in range(len(self.weights)):
            # 线性组合
            z = np.dot(current_input, self.weights[i]) + self.biases[i]
            z_values.append(z)
            
            # 激活函数
            if self.activation == 'sigmoid':
                a = ActivationFunctions.sigmoid(z)
            elif self.activation == 'relu':
                a = ActivationFunctions.relu(z)
            elif self.activation == 'tanh':
                a = ActivationFunctions.tanh(z)
            else:
                raise ValueError(f"不支持的激活函数: {self.activation}")
            
            activations.append(a)
            current_input = a
        
        return activations, z_values
    
    def predict(self, X):
        """预测函数"""
        activations, _ = self.forward_propagation(X)
        return activations[-1]  # 返回输出层的结果

# 测试前向传播
def test_forward_propagation():
    """测试前向传播功能"""
    # 创建一个简单的神经网络：2输入-3隐藏-1输出
    nn = NeuralNetwork([2, 3, 1], activation='sigmoid')
    
    # 创建测试数据
    X = np.array([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]])
    
    # 前向传播
    activations, z_values = nn.forward_propagation(X)
    
    print("输入数据:")
    print(X)
    print("\n隐藏层激活值:")
    print(activations[1])
    print("\n输出层结果:")
    print(activations[2])
    
    return nn, X, activations, z_values

损失函数

常用损失函数

1. 均方误差（MSE）

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

适用于回归问题

2. 交叉熵损失

$$\text{CrossEntropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{c} y_{ij} \log(\hat{y}_{ij})$$

适用于分类问题

3. 二元交叉熵

$$\text{BinaryCrossEntropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]$$

适用于二分类问题

Python实现损失函数

class LossFunctions:
    """损失函数类"""
    
    @staticmethod
    def mse(y_true, y_pred):
        """均方误差损失函数"""
        return np.mean((y_true - y_pred) ** 2)
    
    @staticmethod
    def mse_derivative(y_true, y_pred):
        """均方误差的导数"""
        return 2 * (y_pred - y_true) / len(y_true)
    
    @staticmethod
    def binary_crossentropy(y_true, y_pred):
        """二元交叉熵损失函数"""
        # 防止log(0)
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    
    @staticmethod
    def binary_crossentropy_derivative(y_true, y_pred):
        """二元交叉熵的导数"""
        # 防止除零
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        return (y_pred - y_true) / (y_pred * (1 - y_pred)) / len(y_true)

反向传播算法

反向传播原理

反向传播（Backpropagation）是神经网络训练的核心算法，通过链式法则计算损失函数对网络参数的梯度。

链式法则

对于复合函数 $f(g(x))$，其导数为：
$$\frac{df}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}$$

梯度计算

输出层梯度：
$$\delta^{(L)} = \frac{\partial L}{\partial z^{(L)}} = \frac{\partial L}{\partial a^{(L)}} \odot f’(z^{(L)})$$

隐藏层梯度：
$$\delta^{(l)} = (W^{(l+1)})^T \delta^{(l+1)} \odot f’(z^{(l)})$$

权重梯度：
$$\frac{\partial L}{\partial W^{(l)}} = (a^{(l-1)})^T \delta^{(l)}$$

偏置梯度：
$$\frac{\partial L}{\partial b^{(l)}} = \sum \delta^{(l)}$$

Python实现反向传播

class NeuralNetwork:
    """完整的神经网络实现（包含反向传播）"""
    
    def __init__(self, layers, activation='sigmoid', loss='mse'):
        """
        初始化神经网络
        
        参数:
        layers: 列表，每个元素表示对应层的神经元数量
        activation: 激活函数类型
        loss: 损失函数类型
        """
        self.layers = layers
        self.num_layers = len(layers)
        self.activation = activation
        self.loss = loss
        
        # 初始化权重和偏置
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # Xavier初始化
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def forward_propagation(self, X):
        """前向传播"""
        activations = [X]
        z_values = []
        
        current_input = X
        
        for i in range(len(self.weights)):
            z = np.dot(current_input, self.weights[i]) + self.biases[i]
            z_values.append(z)
            
            if self.activation == 'sigmoid':
                a = ActivationFunctions.sigmoid(z)
            elif self.activation == 'relu':
                a = ActivationFunctions.relu(z)
            elif self.activation == 'tanh':
                a = ActivationFunctions.tanh(z)
            
            activations.append(a)
            current_input = a
        
        return activations, z_values
    
    def backward_propagation(self, X, y, activations, z_values):
        """
        反向传播
        
        参数:
        X: 输入数据
        y: 真实标签
        activations: 前向传播得到的激活值
        z_values: 前向传播得到的线性组合结果
        
        返回:
        weight_gradients: 权重梯度列表
        bias_gradients: 偏置梯度列表
        """
        m = X.shape[0]  # 样本数量
        
        weight_gradients = []
        bias_gradients = []
        
        # 计算输出层误差
        if self.loss == 'mse':
            delta = LossFunctions.mse_derivative(y, activations[-1])
        elif self.loss == 'binary_crossentropy':
            delta = LossFunctions.binary_crossentropy_derivative(y, activations[-1])
        
        # 如果使用sigmoid激活函数，需要乘以激活函数的导数
        if self.activation == 'sigmoid':
            delta *= ActivationFunctions.sigmoid_derivative(z_values[-1])
        elif self.activation == 'relu':
            delta *= ActivationFunctions.relu_derivative(z_values[-1])
        elif self.activation == 'tanh':
            delta *= ActivationFunctions.tanh_derivative(z_values[-1])
        
        # 从输出层向输入层反向传播
        for i in range(len(self.weights) - 1, -1, -1):
            # 计算权重梯度
            weight_grad = np.dot(activations[i].T, delta) / m
            weight_gradients.insert(0, weight_grad)
            
            # 计算偏置梯度
            bias_grad = np.mean(delta, axis=0, keepdims=True)
            bias_gradients.insert(0, bias_grad)
            
            # 计算前一层的误差（如果不是输入层）
            if i > 0:
                delta = np.dot(delta, self.weights[i].T)
                
                # 乘以激活函数的导数
                if self.activation == 'sigmoid':
                    delta *= ActivationFunctions.sigmoid_derivative(z_values[i-1])
                elif self.activation == 'relu':
                    delta *= ActivationFunctions.relu_derivative(z_values[i-1])
                elif self.activation == 'tanh':
                    delta *= ActivationFunctions.tanh_derivative(z_values[i-1])
        
        return weight_gradients, bias_gradients
    
    def update_parameters(self, weight_gradients, bias_gradients, learning_rate):
        """更新网络参数"""
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * weight_gradients[i]
            self.biases[i] -= learning_rate * bias_gradients[i]
    
    def compute_loss(self, y_true, y_pred):
        """计算损失"""
        if self.loss == 'mse':
            return LossFunctions.mse(y_true, y_pred)
        elif self.loss == 'binary_crossentropy':
            return LossFunctions.binary_crossentropy(y_true, y_pred)
    
    def train(self, X, y, epochs, learning_rate, verbose=True):
        """
        训练神经网络
        
        参数:
        X: 训练数据
        y: 训练标签
        epochs: 训练轮数
        learning_rate: 学习率
        verbose: 是否打印训练过程
        """
        losses = []
        
        for epoch in range(epochs):
            # 前向传播
            activations, z_values = self.forward_propagation(X)
            
            # 计算损失
            loss = self.compute_loss(y, activations[-1])
            losses.append(loss)
            
            # 反向传播
            weight_gradients, bias_gradients = self.backward_propagation(X, y, activations, z_values)
            
            # 更新参数
            self.update_parameters(weight_gradients, bias_gradients, learning_rate)
            
            # 打印训练过程
            if verbose and epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.6f}")
        
        return losses
    
    def predict(self, X):
        """预测函数"""
        activations, _ = self.forward_propagation(X)
        return activations[-1]

完整训练示例

XOR问题求解

XOR（异或）问题是神经网络的经典测试案例，单层感知机无法解决，需要多层神经网络。

def solve_xor_problem():
    """使用神经网络解决XOR问题"""
    
    # 准备XOR数据
    X = np.array([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]])
    
    y = np.array([[0],
                  [1],
                  [1],
                  [0]])
    
    print("XOR问题数据:")
    print("输入:", X.tolist())
    print("期望输出:", y.flatten().tolist())
    
    # 创建神经网络：2输入-4隐藏-1输出
    nn = NeuralNetwork([2, 4, 1], activation='sigmoid', loss='mse')
    
    # 训练网络
    print("\n开始训练...")
    losses = nn.train(X, y, epochs=5000, learning_rate=1.0, verbose=True)
    
    # 测试结果
    predictions = nn.predict(X)
    print("\n训练完成！")
    print("预测结果:")
    for i in range(len(X)):
        print(f"输入: {X[i]} -> 预测: {predictions[i][0]:.4f}, 期望: {y[i][0]}")
    
    # 绘制损失曲线
    plt.figure(figsize=(10, 6))
    plt.plot(losses)
    plt.title('XOR问题训练损失曲线')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.grid(True)
    plt.show()
    
    return nn, losses

# 运行XOR问题求解
if __name__ == "__main__":
    # 测试激活函数
    print("=== 激活函数可视化 ===")
    plot_activation_functions()
    
    # 测试前向传播
    print("\n=== 前向传播测试 ===")
    test_forward_propagation()
    
    # 解决XOR问题
    print("\n=== XOR问题求解 ===")
    solve_xor_problem()

二分类问题示例

def binary_classification_example():
    """二分类问题示例"""
    
    # 生成模拟数据
    np.random.seed(42)
    n_samples = 1000
    
    # 生成两个高斯分布的数据
    class_0 = np.random.multivariate_normal([2, 2], [[1, 0.5], [0.5, 1]], n_samples//2)
    class_1 = np.random.multivariate_normal([6, 6], [[1, -0.5], [-0.5, 1]], n_samples//2)
    
    X = np.vstack([class_0, class_1])
    y = np.vstack([np.zeros((n_samples//2, 1)), np.ones((n_samples//2, 1))])
    
    # 数据标准化
    X_mean = np.mean(X, axis=0)
    X_std = np.std(X, axis=0)
    X_normalized = (X - X_mean) / X_std
    
    # 创建神经网络
    nn = NeuralNetwork([2, 8, 4, 1], activation='sigmoid', loss='binary_crossentropy')
    
    # 训练网络
    print("训练二分类神经网络...")
    losses = nn.train(X_normalized, y, epochs=2000, learning_rate=0.5, verbose=True)
    
    # 预测和评估
    predictions = nn.predict(X_normalized)
    predicted_classes = (predictions > 0.5).astype(int)
    accuracy = np.mean(predicted_classes == y)
    
    print(f"\n训练完成！准确率: {accuracy:.4f}")
    
    # 可视化结果
    plt.figure(figsize=(15, 5))
    
    # 原始数据分布
    plt.subplot(1, 3, 1)
    plt.scatter(X[y.flatten() == 0, 0], X[y.flatten() == 0, 1], c='red', alpha=0.6, label='Class 0')
    plt.scatter(X[y.flatten() == 1, 0], X[y.flatten() == 1, 1], c='blue', alpha=0.6, label='Class 1')
    plt.title('原始数据分布')
    plt.legend()
    plt.grid(True)
    
    # 预测结果
    plt.subplot(1, 3, 2)
    plt.scatter(X[predicted_classes.flatten() == 0, 0], X[predicted_classes.flatten() == 0, 1], 
                c='red', alpha=0.6, label='Predicted Class 0')
    plt.scatter(X[predicted_classes.flatten() == 1, 0], X[predicted_classes.flatten() == 1, 1], 
                c='blue', alpha=0.6, label='Predicted Class 1')
    plt.title('预测结果')
    plt.legend()
    plt.grid(True)
    
    # 损失曲线
    plt.subplot(1, 3, 3)
    plt.plot(losses)
    plt.title('训练损失曲线')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    return nn, X, y, predictions

# 运行二分类示例
if __name__ == "__main__":
    print("=== 二分类问题示例 ===")
    binary_classification_example()

优化技巧

学习率调整

学习率是影响神经网络训练效果的关键超参数。

class AdaptiveLearningRate:
    """自适应学习率调整"""
    
    def __init__(self, initial_lr=0.01, decay_rate=0.95, patience=100):
        self.initial_lr = initial_lr
        self.current_lr = initial_lr
        self.decay_rate = decay_rate
        self.patience = patience
        self.best_loss = float('inf')
        self.wait = 0
    
    def update(self, current_loss):
        """根据损失更新学习率"""
        if current_loss < self.best_loss:
            self.best_loss = current_loss
            self.wait = 0
        else:
            self.wait += 1
            if self.wait >= self.patience:
                self.current_lr *= self.decay_rate
                self.wait = 0
                print(f"学习率调整为: {self.current_lr:.6f}")
        
        return self.current_lr

权重初始化策略

class WeightInitializer:
    """权重初始化策略"""
    
    @staticmethod
    def xavier_uniform(fan_in, fan_out):
        """Xavier均匀初始化"""
        limit = np.sqrt(6.0 / (fan_in + fan_out))
        return np.random.uniform(-limit, limit, (fan_in, fan_out))
    
    @staticmethod
    def xavier_normal(fan_in, fan_out):
        """Xavier正态初始化"""
        std = np.sqrt(2.0 / (fan_in + fan_out))
        return np.random.normal(0, std, (fan_in, fan_out))
    
    @staticmethod
    def he_normal(fan_in, fan_out):
        """He正态初始化（适用于ReLU）"""
        std = np.sqrt(2.0 / fan_in)
        return np.random.normal(0, std, (fan_in, fan_out))