神经网络学习概述

神经网络学习是指从训练数据中自动获取最优权重参数的过程。本章将深入探讨神经网络学习的核心算法，包括损失函数、梯度下降法、数值微分等关键概念。

学习的定义

在机器学习中，学习是指根据训练数据调整模型参数，使模型能够对未知数据做出准确预测的过程。对于神经网络而言，学习就是寻找最优的权重和偏置参数。

学习的四个要素

损失函数（Loss Function）：衡量模型预测与真实值之间差异的函数
优化算法（Optimization Algorithm）：寻找最优参数的方法，如梯度下降法
训练数据（Training Data）：用于训练模型的已知输入输出对
模型参数（Model Parameters）：需要学习的权重和偏置

损失函数

损失函数的作用

损失函数是神经网络学习的指导原则，它量化了模型预测结果与真实标签之间的差异。

损失函数的数学表达式为：
$$L(\theta) = \frac{1}{n} \sum_{i=1}^{n} l(f(x_i; \theta), y_i)$$

其中：

$\theta$ 表示模型参数（权重和偏置）
$f(x_i; \theta)$ 表示模型对输入 $x_i$ 的预测输出
$y_i$ 表示真实标签
$l(\cdot, \cdot)$ 表示单个样本的损失函数

常用损失函数

1. 均方误差（Mean Squared Error, MSE）

适用于回归问题

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

特点：

对异常值敏感
可导，便于梯度计算
几何意义：欧几里得距离的平方

2. 交叉熵损失（Cross-Entropy Loss）

适用于多分类问题

$$\text{CrossEntropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{c} y_{ij} \log(\hat{y}_{ij})$$

其中 $c$ 是类别数，$y_{ij}$ 是one-hot编码的真实标签。

3. 二元交叉熵（Binary Cross-Entropy）

适用于二分类问题

$$\text{BinaryCrossEntropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]$$

Python实现损失函数

import numpy as np
import matplotlib.pyplot as plt

class LossFunctions:
    """损失函数实现类"""
    
    @staticmethod
    def mse(y_true, y_pred):
        """
        均方误差损失函数
        
        参数:
        y_true: 真实值，形状为 (n_samples, n_outputs)
        y_pred: 预测值，形状为 (n_samples, n_outputs)
        
        返回:
        loss: 标量损失值
        """
        return np.mean((y_true - y_pred) ** 2)
    
    @staticmethod
    def mse_derivative(y_true, y_pred):
        """均方误差的导数"""
        return 2 * (y_pred - y_true) / len(y_true)
    
    @staticmethod
    def binary_crossentropy(y_true, y_pred):
        """
        二元交叉熵损失函数
        
        参数:
        y_true: 真实标签 (0 或 1)
        y_pred: 预测概率 (0 到 1 之间)
        """
        # 防止log(0)的情况
        epsilon = 1e-15
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    
    @staticmethod
    def binary_crossentropy_derivative(y_true, y_pred):
        """二元交叉熵的导数"""
        epsilon = 1e-15
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        return (y_pred - y_true) / (y_pred * (1 - y_pred)) / len(y_true)
    
    @staticmethod
    def categorical_crossentropy(y_true, y_pred):
        """
        多分类交叉熵损失函数
        
        参数:
        y_true: one-hot编码的真实标签
        y_pred: 预测概率分布
        """
        epsilon = 1e-15
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))

def visualize_loss_functions():
    """可视化不同损失函数的特性"""
    
    # 生成数据
    y_true = 0.5  # 真实值
    y_pred_range = np.linspace(0.01, 0.99, 100)
    
    # 计算不同损失函数的值
    mse_losses = [(y_true - y_pred)**2 for y_pred in y_pred_range]
    bce_losses = [-(y_true * np.log(y_pred) + (1-y_true) * np.log(1-y_pred)) 
                  for y_pred in y_pred_range]
    
    # 绘制损失函数曲线
    plt.figure(figsize=(12, 5))
    
    plt.subplot(1, 2, 1)
    plt.plot(y_pred_range, mse_losses, 'b-', linewidth=2, label='MSE Loss')
    plt.axvline(x=y_true, color='r', linestyle='--', alpha=0.7, label=f'True Value = {y_true}')
    plt.xlabel('Predicted Value')
    plt.ylabel('Loss')
    plt.title('均方误差损失函数')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    plt.subplot(1, 2, 2)
    plt.plot(y_pred_range, bce_losses, 'g-', linewidth=2, label='Binary Cross-Entropy Loss')
    plt.axvline(x=y_true, color='r', linestyle='--', alpha=0.7, label=f'True Value = {y_true}')
    plt.xlabel('Predicted Probability')
    plt.ylabel('Loss')
    plt.title('二元交叉熵损失函数')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# 测试损失函数
if __name__ == "__main__":
    print("=== 损失函数可视化 ===")
    visualize_loss_functions()

数值微分

微分的定义

微分是求函数在某点处切线斜率的数学运算。对于函数 $f(x)$，在点 $x$ 处的导数定义为：

$$f’(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

在计算机中，我们无法计算极限，因此使用数值微分来近似计算导数。

数值微分的实现

前向差分

$$f’(x) \approx \frac{f(x+h) - f(x)}{h}$$

中心差分（更精确）

$$f’(x) \approx \frac{f(x+h) - f(x-h)}{2h}$$

Python实现数值微分

def numerical_diff(f, x, h=1e-4):
    """
    数值微分实现（中心差分法）
    
    参数:
    f: 函数
    x: 求导点
    h: 微小变化量
    
    返回:
    导数的近似值
    """
    return (f(x + h) - f(x - h)) / (2 * h)

def numerical_gradient(f, x, h=1e-4):
    """
    计算多变量函数的梯度
    
    参数:
    f: 多变量函数
    x: 输入向量
    h: 微小变化量
    
    返回:
    梯度向量
    """
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        
        # f(x+h)的计算
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        # f(x-h)的计算
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val  # 还原值
    
    return grad

def test_numerical_diff():
    """测试数值微分"""
    
    # 测试简单函数 f(x) = x^2
    def f1(x):
        return x**2
    
    # 解析解：f'(x) = 2x
    x = 3.0
    numerical_derivative = numerical_diff(f1, x)
    analytical_derivative = 2 * x
    
    print(f"函数 f(x) = x^2 在 x = {x} 处：")
    print(f"数值微分结果: {numerical_derivative:.6f}")
    print(f"解析解结果: {analytical_derivative:.6f}")
    print(f"误差: {abs(numerical_derivative - analytical_derivative):.8f}")
    
    # 测试多变量函数 f(x0, x1) = x0^2 + x1^2
    def f2(x):
        return x[0]**2 + x[1]**2
    
    x = np.array([3.0, 4.0])
    numerical_grad = numerical_gradient(f2, x.copy())
    analytical_grad = np.array([2*x[0], 2*x[1]])
    
    print(f"\n函数 f(x0, x1) = x0^2 + x1^2 在 x = {x} 处：")
    print(f"数值梯度: {numerical_grad}")
    print(f"解析梯度: {analytical_grad}")
    print(f"误差: {np.linalg.norm(numerical_grad - analytical_grad):.8f}")

# 运行测试
if __name__ == "__main__":
    print("=== 数值微分测试 ===")
    test_numerical_diff()

梯度下降法

梯度下降的原理

梯度下降法是一种优化算法，通过沿着函数梯度的反方向更新参数，逐步找到函数的最小值点。

梯度的几何意义

梯度方向：函数增长最快的方向
梯度大小：函数变化的速率
负梯度方向：函数下降最快的方向

参数更新公式

$$\theta_{new} = \theta_{old} - \eta \nabla_\theta L(\theta)$$

其中：

$\theta$ 是参数向量
$\eta$ 是学习率（learning rate）
$\nabla_\theta L(\theta)$ 是损失函数对参数的梯度

学习率的选择

学习率是梯度下降法中最重要的超参数之一。

学习率过大：可能导致参数更新过度，错过最优解
学习率过小：收敛速度慢，需要更多迭代次数
自适应学习率：根据训练过程动态调整学习率

Python实现梯度下降

class GradientDescent:
    """梯度下降优化器"""
    
    def __init__(self, learning_rate=0.01):
        self.learning_rate = learning_rate
        self.loss_history = []
    
    def optimize(self, f, grad_f, initial_params, max_iterations=1000, tolerance=1e-6):
        """
        使用梯度下降优化函数
        
        参数:
        f: 目标函数
        grad_f: 梯度函数
        initial_params: 初始参数
        max_iterations: 最大迭代次数
        tolerance: 收敛容忍度
        
        返回:
        优化后的参数和损失历史
        """
        params = initial_params.copy()
        
        for i in range(max_iterations):
            # 计算当前损失
            current_loss = f(params)
            self.loss_history.append(current_loss)
            
            # 计算梯度
            gradient = grad_f(params)
            
            # 更新参数
            params = params - self.learning_rate * gradient
            
            # 检查收敛
            if np.linalg.norm(gradient) < tolerance:
                print(f"在第 {i+1} 次迭代后收敛")
                break
        
        return params, self.loss_history

def test_gradient_descent():
    """测试梯度下降算法"""
    
    # 定义目标函数：f(x, y) = (x-2)^2 + (y-3)^2
    # 最优解：x=2, y=3，最小值=0
    def objective_function(params):
        x, y = params
        return (x - 2)**2 + (y - 3)**2
    
    def gradient_function(params):
        x, y = params
        grad_x = 2 * (x - 2)
        grad_y = 2 * (y - 3)
        return np.array([grad_x, grad_y])
    
    # 初始参数
    initial_params = np.array([0.0, 0.0])
    
    # 测试不同学习率
    learning_rates = [0.01, 0.1, 0.5]
    
    plt.figure(figsize=(15, 5))
    
    for i, lr in enumerate(learning_rates):
        optimizer = GradientDescent(learning_rate=lr)
        optimal_params, loss_history = optimizer.optimize(
            objective_function, gradient_function, initial_params, max_iterations=100
        )
        
        print(f"\n学习率 {lr}:")
        print(f"最优参数: x={optimal_params[0]:.4f}, y={optimal_params[1]:.4f}")
        print(f"最小损失: {objective_function(optimal_params):.8f}")
        print(f"迭代次数: {len(loss_history)}")
        
        # 绘制损失曲线
        plt.subplot(1, 3, i+1)
        plt.plot(loss_history)
        plt.title(f'学习率 = {lr}')
        plt.xlabel('迭代次数')
        plt.ylabel('损失值')
        plt.grid(True, alpha=0.3)
        plt.yscale('log')
    
    plt.tight_layout()
    plt.show()

# 运行测试
if __name__ == "__main__":
    print("=== 梯度下降测试 ===")
    test_gradient_descent()

神经网络中的梯度下降

神经网络参数更新

在神经网络中，需要更新的参数包括：

权重矩阵 $W^{(l)}$
偏置向量 $b^{(l)}$

参数更新公式

$$W^{(l)} = W^{(l)} - \eta \frac{\partial L}{\partial W^{(l)}}$$

$$b^{(l)} = b^{(l)} - \eta \frac{\partial L}{\partial b^{(l)}}$$

简单神经网络实现

class SimpleNeuralNetwork:
    """简单的两层神经网络实现"""
    
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.01):
        """
        初始化神经网络
        
        参数:
        input_size: 输入层大小
        hidden_size: 隐藏层大小
        output_size: 输出层大小
        learning_rate: 学习率
        """
        self.learning_rate = learning_rate
        
        # 初始化权重和偏置（使用小随机数）
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))
        
        # 记录训练历史
        self.loss_history = []
    
    def sigmoid(self, x):
        """Sigmoid激活函数"""
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # 防止溢出
    
    def sigmoid_derivative(self, x):
        """Sigmoid函数的导数"""
        s = self.sigmoid(x)
        return s * (1 - s)
    
    def forward(self, X):
        """
        前向传播
        
        参数:
        X: 输入数据，形状为 (batch_size, input_size)
        
        返回:
        输出和中间计算结果
        """
        # 第一层
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        
        # 第二层
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        
        return self.a2
    
    def backward(self, X, y, output):
        """
        反向传播
        
        参数:
        X: 输入数据
        y: 真实标签
        output: 前向传播的输出
        """
        m = X.shape[0]  # 样本数量
        
        # 计算输出层误差
        dz2 = output - y
        dW2 = (1/m) * np.dot(self.a1.T, dz2)
        db2 = (1/m) * np.sum(dz2, axis=0, keepdims=True)
        
        # 计算隐藏层误差
        dz1 = np.dot(dz2, self.W2.T) * self.sigmoid_derivative(self.z1)
        dW1 = (1/m) * np.dot(X.T, dz1)
        db1 = (1/m) * np.sum(dz1, axis=0, keepdims=True)
        
        # 更新参数
        self.W2 -= self.learning_rate * dW2
        self.b2 -= self.learning_rate * db2
        self.W1 -= self.learning_rate * dW1
        self.b1 -= self.learning_rate * db1
    
    def train(self, X, y, epochs, verbose=True):
        """
        训练神经网络
        
        参数:
        X: 训练数据
        y: 训练标签
        epochs: 训练轮数
        verbose: 是否打印训练过程
        """
        for epoch in range(epochs):
            # 前向传播
            output = self.forward(X)
            
            # 计算损失
            loss = LossFunctions.mse(y, output)
            self.loss_history.append(loss)
            
            # 反向传播
            self.backward(X, y, output)
            
            # 打印训练过程
            if verbose and epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.6f}")
    
    def predict(self, X):
        """预测函数"""
        return self.forward(X)

def test_simple_neural_network():
    """测试简单神经网络"""
    
    # 生成简单的训练数据
    np.random.seed(42)
    X = np.random.randn(100, 2)  # 100个样本，2个特征
    y = ((X[:, 0] + X[:, 1]) > 0).astype(int).reshape(-1, 1)  # 简单的线性分类
    
    # 创建神经网络
    nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1, learning_rate=1.0)
    
    # 训练网络
    print("开始训练简单神经网络...")
    nn.train(X, y, epochs=1000, verbose=True)
    
    # 测试预测
    predictions = nn.predict(X)
    predicted_classes = (predictions > 0.5).astype(int)
    accuracy = np.mean(predicted_classes == y)
    
    print(f"\n训练完成！")
    print(f"训练准确率: {accuracy:.4f}")
    
    # 绘制损失曲线
    plt.figure(figsize=(10, 6))
    plt.plot(nn.loss_history)
    plt.title('训练损失曲线')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.grid(True, alpha=0.3)
    plt.yscale('log')
    plt.show()
    
    return nn

# 运行测试
if __name__ == "__main__":
    print("=== 简单神经网络测试 ===")
    test_simple_neural_network()

学习算法的实现

随机梯度下降（SGD）

随机梯度下降是梯度下降的变种，每次只使用一个样本来计算梯度。

SGD的优势

计算效率高：每次迭代只需要一个样本
内存需求小：适合大数据集
能够跳出局部最优：随机性有助于探索

小批量梯度下降（Mini-batch SGD）

结合了批量梯度下降和随机梯度下降的优点：

$$\theta = \theta - \eta \frac{1}{m} \sum_{i \in \text{batch}} \nabla_\theta L(\theta; x_i, y_i)$$

完整的学习算法实现

class NeuralNetworkLearner:
    """完整的神经网络学习器"""
    
    def __init__(self, layers, learning_rate=0.01, batch_size=32):
        """
        初始化学习器
        
        参数:
        layers: 网络层结构列表，如 [2, 4, 1]
        learning_rate: 学习率
        batch_size: 批量大小
        """
        self.layers = layers
        self.learning_rate = learning_rate
        self.batch_size = batch_size
        self.num_layers = len(layers)
        
        # 初始化权重和偏置
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # He初始化（适用于ReLU激活函数）
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
        
        # 训练历史
        self.loss_history = []
        self.accuracy_history = []
    
    def relu(self, x):
        """ReLU激活函数"""
        return np.maximum(0, x)
    
    def relu_derivative(self, x):
        """ReLU函数的导数"""
        return (x > 0).astype(float)
    
    def sigmoid(self, x):
        """Sigmoid激活函数"""
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
    
    def sigmoid_derivative(self, x):
        """Sigmoid函数的导数"""
        s = self.sigmoid(x)
        return s * (1 - s)
    
    def forward_propagation(self, X):
        """前向传播"""
        activations = [X]
        z_values = []
        
        current_input = X
        
        for i in range(len(self.weights)):
            z = np.dot(current_input, self.weights[i]) + self.biases[i]
            z_values.append(z)
            
            # 隐藏层使用ReLU，输出层使用Sigmoid
            if i < len(self.weights) - 1:
                a = self.relu(z)
            else:
                a = self.sigmoid(z)
            
            activations.append(a)
            current_input = a
        
        return activations, z_values
    
    def backward_propagation(self, X, y, activations, z_values):
        """反向传播"""
        m = X.shape[0]
        
        weight_gradients = []
        bias_gradients = []
        
        # 计算输出层误差（使用MSE损失）
        delta = (activations[-1] - y) * self.sigmoid_derivative(z_values[-1])
        
        # 从输出层向输入层反向传播
        for i in range(len(self.weights) - 1, -1, -1):
            # 计算权重和偏置梯度
            weight_grad = np.dot(activations[i].T, delta) / m
            bias_grad = np.mean(delta, axis=0, keepdims=True)
            
            weight_gradients.insert(0, weight_grad)
            bias_gradients.insert(0, bias_grad)
            
            # 计算前一层的误差（如果不是输入层）
            if i > 0:
                delta = np.dot(delta, self.weights[i].T) * self.relu_derivative(z_values[i-1])
        
        return weight_gradients, bias_gradients
    
    def update_parameters(self, weight_gradients, bias_gradients):
        """更新网络参数"""
        for i in range(len(self.weights)):
            self.weights[i] -= self.learning_rate * weight_gradients[i]
            self.biases[i] -= self.learning_rate * bias_gradients[i]
    
    def create_mini_batches(self, X, y):
        """创建小批量数据"""
        m = X.shape[0]
        mini_batches = []
        
        # 随机打乱数据
        permutation = np.random.permutation(m)
        shuffled_X = X[permutation]
        shuffled_y = y[permutation]
        
        # 创建小批量
        num_complete_minibatches = m // self.batch_size
        
        for k in range(num_complete_minibatches):
            mini_batch_X = shuffled_X[k * self.batch_size:(k + 1) * self.batch_size]
            mini_batch_y = shuffled_y[k * self.batch_size:(k + 1) * self.batch_size]
            mini_batches.append((mini_batch_X, mini_batch_y))
        
        # 处理剩余的样本
        if m % self.batch_size != 0:
            mini_batch_X = shuffled_X[num_complete_minibatches * self.batch_size:]
            mini_batch_y = shuffled_y[num_complete_minibatches * self.batch_size:]
            mini_batches.append((mini_batch_X, mini_batch_y))
        
        return mini_batches
    
    def train(self, X, y, epochs, verbose=True):
        """
        训练神经网络
        
        参数:
        X: 训练数据
        y: 训练标签
        epochs: 训练轮数
        verbose: 是否打印训练过程
        """
        for epoch in range(epochs):
            epoch_loss = 0
            epoch_accuracy = 0
            num_batches = 0
            
            # 创建小批量数据
            mini_batches = self.create_mini_batches(X, y)
            
            for mini_batch_X, mini_batch_y in mini_batches:
                # 前向传播
                activations, z_values = self.forward_propagation(mini_batch_X)
                
                # 计算损失
                batch_loss = LossFunctions.mse(mini_batch_y, activations[-1])
                epoch_loss += batch_loss
                
                # 计算准确率（对于二分类问题）
                predictions = (activations[-1] > 0.5).astype(int)
                batch_accuracy = np.mean(predictions == mini_batch_y)
                epoch_accuracy += batch_accuracy
                
                # 反向传播
                weight_gradients, bias_gradients = self.backward_propagation(
                    mini_batch_X, mini_batch_y, activations, z_values
                )
                
                # 更新参数
                self.update_parameters(weight_gradients, bias_gradients)
                
                num_batches += 1
            
            # 记录平均损失和准确率
            avg_loss = epoch_loss / num_batches
            avg_accuracy = epoch_accuracy / num_batches
            
            self.loss_history.append(avg_loss)
            self.accuracy_history.append(avg_accuracy)
            
            # 打印训练过程
            if verbose and epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {avg_loss:.6f}, Accuracy: {avg_accuracy:.4f}")
    
    def predict(self, X):
        """预测函数"""
        activations, _ = self.forward_propagation(X)
        return activations[-1]

def comprehensive_learning_example():
    """综合学习示例"""
    
    # 生成更复杂的数据集
    np.random.seed(42)
    n_samples = 1000
    
    # 生成螺旋数据
    def generate_spiral_data(n_points, n_classes):
        X = np.zeros((n_points * n_classes, 2))
        y = np.zeros(n_points * n_classes, dtype='uint8')
        
        for j in range(n_classes):
            ix = range(n_points * j, n_points * (j + 1))
            r = np.linspace(0.0, 1, n_points)
            t = np.linspace(j * 4, (j + 1) * 4, n_points) + np.random.randn(n_points) * 0.2
            X[ix] = np.c_[r * np.sin(t), r * np.cos(t)]
            y[ix] = j
        
        return X, y
    
    X, y = generate_spiral_data(n_samples // 2, 2)
    y = y.reshape(-1, 1)
    
    # 数据标准化
    X_mean = np.mean(X, axis=0)
    X_std = np.std(X, axis=0)
    X_normalized = (X - X_mean) / X_std
    
    # 创建神经网络学习器
    learner = NeuralNetworkLearner(
        layers=[2, 16, 8, 1], 
        learning_rate=0.1, 
        batch_size=64
    )
    
    # 训练网络
    print("开始训练综合学习示例...")
    learner.train(X_normalized, y, epochs=2000, verbose=True)
    
    # 评估结果
    predictions = learner.predict(X_normalized)
    predicted_classes = (predictions > 0.5).astype(int)
    accuracy = np.mean(predicted_classes == y)
    
    print(f"\n训练完成！")
    print(f"最终准确率: {accuracy:.4f}")
    
    # 可视化结果
    plt.figure(figsize=(15, 5))
    
    # 原始数据
    plt.subplot(1, 3, 1)
    plt.scatter(X[y.flatten() == 0, 0], X[y.flatten() == 0, 1], c='red', alpha=0.6, label='Class 0')
    plt.scatter(X[y.flatten() == 1, 0], X[y.flatten() == 1, 1], c='blue', alpha=0.6, label='Class 1')
    plt.title('原始数据分布')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # 预测结果
    plt.subplot(1, 3, 2)
    plt.scatter(X[predicted_classes.flatten() == 0, 0], X[predicted_classes.flatten() == 0, 1], 
                c='red', alpha=0.6, label='Predicted Class 0')
    plt.scatter(X[predicted_classes.flatten() == 1, 0], X[predicted_classes.flatten() == 1, 1], 
                c='blue', alpha=0.6, label='Predicted Class 1')
    plt.title('预测结果')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # 训练历史
    plt.subplot(1, 3, 3)
    plt.plot(learner.loss_history, label='Loss', alpha=0.7)
    plt.plot(learner.accuracy_history, label='Accuracy', alpha=0.7)
    plt.title('训练历史')
    plt.xlabel('Epoch')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return learner

# 运行综合示例
if __name__ == "__main__":
    print("=== 综合学习示例 ===")
    comprehensive_learning_example()

学习相关的技巧

权重初始化

合适的权重初始化对神经网络的训练至关重要。

常用初始化方法

零初始化：所有权重设为0（不推荐，会导致对称性问题）
随机初始化：使用小随机数
Xavier初始化：适用于Sigmoid和Tanh激活函数
He初始化：适用于ReLU激活函数

class WeightInitializer:
    """权重初始化工具类"""
    
    @staticmethod
    def zeros(shape):
        """零初始化"""
        return np.zeros(shape)
    
    @staticmethod
    def random_normal(shape, mean=0, std=0.01):
        """正态分布随机初始化"""
        return np.random.normal(mean, std, shape)
    
    @staticmethod
    def xavier_uniform(fan_in, fan_out):
        """Xavier均匀初始化"""
        limit = np.sqrt(6.0 / (fan_in + fan_out))
        return np.random.uniform(-limit, limit, (fan_in, fan_out))
    
    @staticmethod
    def xavier_normal(fan_in, fan_out):
        """Xavier正态初始化"""
        std = np.sqrt(2.0 / (fan_in + fan_out))
        return np.random.normal(0, std, (fan_in, fan_out))
    
    @staticmethod
    def he_uniform(fan_in, fan_out):
        """He均匀初始化"""
        limit = np.sqrt(6.0 / fan_in)
        return np.random.uniform(-limit, limit, (fan_in, fan_out))
    
    @staticmethod
    def he_normal(fan_in, fan_out):
        """He正态初始化"""
        std = np.sqrt(2.0 / fan_in)
        return np.random.normal(0, std, (fan_in, fan_out))

def compare_weight_initialization():
    """比较不同权重初始化方法的效果"""
    
    # 生成测试数据
    np.random.seed(42)
    X = np.random.randn(200, 2)
    y = ((X[:, 0] + X[:, 1]) > 0).astype(int).reshape(-1, 1)
    
    # 测试不同初始化方法
    init_methods = {
        'Random Small': lambda fi, fo: np.random.randn(fi, fo) * 0.01,
        'Xavier Normal': WeightInitializer.xavier_normal,
        'He Normal': WeightInitializer.he_normal
    }
    
    plt.figure(figsize=(15, 5))
    
    for i, (name, init_func) in enumerate(init_methods.items()):
        # 创建网络
        nn = SimpleNeuralNetwork(2, 8, 1, learning_rate=0.1)
        
        # 使用指定的初始化方法
        nn.W1 = init_func(2, 8)
        nn.W2 = init_func(8, 1)
        
        # 训练网络
        nn.train(X, y, epochs=500, verbose=False)
        
        # 绘制损失曲线
        plt.subplot(1, 3, i+1)
        plt.plot(nn.loss_history)
        plt.title(f'{name} 初始化')
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.grid(True, alpha=0.3)
        plt.yscale('log')
    
    plt.tight_layout()
    plt.show()

# 运行比较
if __name__ == "__main__":
    print("=== 权重初始化比较 ===")
    compare_weight_initialization()