Introducing MyTorch: A Fully Custom, Tailored Deep Learning Framework

March - May 2023

Project Overview

The goal of this project was to create an object-oriented deep learning library called "MyTorch". The library has the ability to call functions similarly to the PyTorch library. MyTorch has working loss functions, optimizers (stochastic gradient descent), and forward/backward passes. The library can be used to create fully functioning neural networks.

*Note: The project utilizes some supplemental functions taken from the d2l.ai textbook. With that said, all main development functions listed and described here are developed with my own mathematical knowledge (and NumPy).

Linear Class

The first step in development was to create the linear.py class, which initializes a single linear layer with given input/output sizes, and performs the forward and backward operations.

The Linear class holds three functions. These are the init, forward, and backward functions, and they are described as follows:

  • Init Function This function takes in the number of inputs and outputs for the given layer, then initializes the weights and biases for the layer. The code looks like this:
def __init__(self, num_inputs, num_outputs):
    """
    Initialize the weights to be zero-mean Gaussian with 
    variance 0.01 and biases to zero.
 
    :param num_inputs: Number of inputs to layer.
    :param num_outputs: Number of outputs after layer.
    """
    self.W = np.random.randn(num_inputs, num_outputs)*0.1
    self.b = np.zeros(num_outputs)
  • Forward Function This function performs a forward pass through the linear layer. Here's the code:
def __init__(self, num_inputs, num_outputs):
    """
    Initialize the weights to be zero-mean Gaussian with 
    variance 0.01 and biases to zero.
 
    :param num_inputs: Number of inputs to layer.
    :param num_outputs: Number of outputs after layer.
    """
    self.W = np.random.randn(num_inputs, num_outputs)*0.1
    self.b = np.zeros(num_outputs)
  • Backward Function This function performs the backward pass through the linear layer:
def __init__(self, num_inputs, num_outputs):
    """
    Initialize the weights to be zero-mean Gaussian with 
    variance 0.01 and biases to zero.
 
    :param num_inputs: Number of inputs to layer.
    :param num_outputs: Number of outputs after layer.
    """
    self.W = np.random.randn(num_inputs, num_outputs)*0.1
    self.b = np.zeros(num_outputs)

MSE & Cross Entropy Loss Classes

The next step in development was to create the MSE (Mean Squared Error) and CE loss classes and their respecting functions. The classes each contain forward passes that compute the MSE and CE losses between the outputs and true targets. They also contain a backward pass that computes the gradient for both losses with respect to the output.

MSE Functions

def forward(self, O, Y):
    """
    Compute MSE loss between outputs O and true targets Y.
 
    :param O: Output predictions.
    :param Y: True targets.
    :return L: Mean squared error, normalized by total number
    of elements in O.
    """
    self.O = O
    self.Y = Y
    self.dims = O.shape
    self.N = O.shape[0]
    self.q = O.shape[1]
    L = np.sum((O - Y)**2) / np.prod(self.dims)
    return L
 
def backward(self):
    """
    Compute gradient dLdO for MSE loss.
 
    :return dLdO: Gradient of loss with respect to output O.
    """
    O = self.O
    Y = self.Y
    dLdO = 2*(O - Y) / np.prod(self.dims)
    return dLdO

CE Functions

def forward(self, O, Y):
    """
    Compute cross entropy loss between outputs O and true targets Y
    as well as softmax probabilities for outputs O.
    Note: Does not match PyTorch unless Y is a one-hot label matrix.
 
    :param O: Output predictions.
    :param Y: True targets.
    :return L: Cross entropy loss, normalized by number of examples.
    """
    self.O = O
    self.Y = Y
    self.N = O.shape[0]
    O_exp = np.exp(O)
    partition = O_exp.sum(1, keepdims=True)
    self.softmax = O_exp / partition
    L = -np.sum(Y * np.log(self.softmax)) / self.N
    return L
 
def backward(self):
    """
    Compute gradient dLdO for cross entropy loss.
 
    :return dLdO: Gradient of loss with respect to output O.
    """
    dLdO = (self.softmax - self.Y) / self.N
    return dLdO

Stochastic Gradient Descent Optimization

After graining the ability to pass gradients, stochastic gradient descent (SGD) was used to optimize the parameters of the linear layer. There is only one function in the SGD class, and it performs a single SGD optimization step:

def step(self):
    """
    Perform a single SGD step.
    """
    if hasattr(self.model, "layers"):
        for i in range(self.L):
            dLdW = self.l[i].dLdW
            dLdb = self.l[i].dLdb
            self.l[i].W -= self.lr * dLdW
            self.l[i].b -= self.lr * dLdb
    else:
        dLdW = self.model.dLdW
        dLdb = self.model.dLdb
        self.model.W -= self.lr * dLdW
        self.model.b -= self.lr * dLdb    

Convolutional Layers

Now that the base library is completed, compatibility with convolutional layers was added to the MyTorch library. In particular, the class implements the forward and backward passes of a two-dimensional convolutional layer. For simplicity, the bias term is ignored.

  • Forward Pass
def forward(self, X):
    """
    Forward operation of convolutional layer.
 
    :param X: Input image batch of size (batch_size, in_channels, height, width).
    :return O: Output feature map.
    """
    self.X = X
    O = np.stack([self.corr2d_multi_in_out(X[ii, :, :, :], self.W) 
                        for ii in range(X.shape[0])], 0)
    return O
  • Backward Pass
def backward(self, dLdO):
    """
    Backward operation of convolutional layer. Stores derivative dLdW, and returns
    dLdX.
 
    :param dLdO: Derivative of loss with respect to output.
    Obtained from backward operation on loss object.
    :returns dLdX: Derivative of loss with respect to input.
    """
    dLdW = np.zeros(self.W.shape)
    for c in range(self.W.shape[1]):
        for d in range(self.W.shape[0]):
            dLdW[d, c, :, :] = sum((self.corr2d(x, k) for x, k in zip(self.X[:,c,:,:], dLdO[:,d,:,:])), 0)
 
    kernel_size = dLdW.shape[-2:]
    in_channels = dLdW.shape[1]
    batch_size = self.X.shape[0]
    pad_height = kernel_size[0] - 1
    pad_width = kernel_size[1] - 1
    pad_size = ((0, 0), (pad_height, pad_width), (pad_height, pad_width))
    dLdX = np.zeros(self.X.shape)
    for cc in range(in_channels):
        fW = np.flip(np.flip(self.W[:, cc, :, :], 1), 2)
        dLdX[:, cc, :, :] = np.stack([self.corr2d_multi_in(np.pad(dLdO[ii, :, :, :], pad_size), fW) 
                                        for ii in range(batch_size)], 0) 
    
    self.dLdW = dLdW
    return dLdX

RNN Layers

For the final part of this project, the ability to create Recurrent Neural Network layers was added to the MyTorch library. In particular, the forward and backward passes of a standard recurrent layer are implemented here. For simplicity, the bias term is ignored.

  • Forward Pass The forward pass of the RNN layer calculates the next H value with this equation: tanh(XtWxh + HtWhh):
def forward(self, inputs, state=None):
    """
    Forward operation of RNN layer. Performs
    operation H_t+1 = tanh(Xt*Wxh + Ht*Whh).
 
    :param inputs: Input data matrix with shape (num_steps, batch_size, num_inputs).
    :param state: Initial hidden state with shape (batch_size, num_hiddens).
    :return outputs: Output data matrix after linear transformation.
    :return state: Final hidden state for each element in the batch.
    """
    if state is None:
        state = np.zeros((inputs.shape[1], self.num_hiddens))
 
    outputs = []
    for X in inputs:
        part1 = X @ self.Wxh
        part2 = state @ self.Whh
        state = torch.tanh(torch.from_numpy(part1 + part2)).numpy()
        outputs.append(state)
    
    outputs = np.array(outputs)
    return outputs, state
  • Backward Pass The backward pass performs the backpropogation operations for each variable in the RNN layer. It also stores the derivatives dLdWxh and dLdWhh:
def backward(self, dLdO):
        """
        Backpropagation operation for variables in RNN
        layer. Stores derivatives dLdWxh, dLdWhh.
 
        :param dLdO: Derivative of loss with respect to output.
        Obtained from backward operation on loss object.
        :returns None:
        """
 
        dLd0 = dLdO
        dLdWxh = np.zeros_like(self.Wxh)
        dLdWhh = np.zeros_like(self.Whh)
        for i in range(len(self.inputs)):
            dLdWxh += self.inputs[i].T @ dLd0[i]
            dLdWhh += self.states[i].T @ dLd0[i]
            dLd0[i] = dLd0[i] @ self.W_hh.T
            dLd0[i] = dLd0[i] * (1 - self.states[i] ** 2)
        self.dLdWxh = dLdWxh
        self.dLdWhh = dLdWhh
        
        return None

Conclusion

Although general, the code for this project (I think) gives you a pretty good insight into some of the mathematics behind deep learning libraries. Unfortunately, it's pretty complex--beyond the scope of a simple project overview. However, if you're interested in learning about the math foundations behind this project and deep/machine learning in general, I highly recommend the d2l.ai textbook. Although you'll need a pretty solid foundation in applied linear algebra, the textbook is a great resource for learning how to implement deep learning algorithms (and more basic ones) from scratch.