I am currently working on building a neural network completely from scratch using base R. This post shows some my progress thus far: a basic multilayered perceptron with full feed-forward and backpropagation capabilities. It uses basic Stochastic Gradient Descent to optimize the parameters (weights & biases). In this post, it performs a simple classification task on some simulated data.
When the entire project is complete, I hope to have a fully usable system of neural network functions complete with different optimization methods (such as Adam), hyperparameters (like learning rate decay and momentum), and out-of-sample prediction capabilities. I’ve been writing up the process in a long-form blog/tutorial which I plan to post for anyone who is interested.
Learning how each component of a neural network works and then building them with code has been an enlightening experience, and really deepened my understanding of neural networks and their capabilities. It has also been a unique challenge to build the network using base R, which I believe is worthwhile since it prevents me from using pre-existing packages as quick fixes to complicated problems (and also avoids any dependency issues). At the end of the day, the calculations performed by the computer are fairly basic dot products and matrix multiplications, so advanced packages aren’t really necessary.
This project is inspired by the “Neural Networks from Scratch in Python” book, which does an excellent job explaining the intuition, math, and implementation of each step in a neural network. I cannot recommend it enough.

The Task

This function simulates data in a spiral form with 3 labels, as seen below. This provides us with a basic multi-class classification task, where we want the neural network to learn the decision boundaries that effectively separate each class. This function is inspired by this Stanford course.

## A function to simulate spiral data
sim_data_fn = function(
  N = 100, # number of points per class
  D = 2, # dimensionality (number of features)
  K = 3, # number of classes
  random_order = FALSE
){
set.seed(123)
   X = data.frame() # data matrix (each row = single sample)
   y = data.frame() # class labels

for (j in (1:K)){
  r = seq(0.05,1,length.out = N) # radius
  t = seq((j-1)*4.7,j*4.7, length.out = N) + rnorm(N, sd = 0.3) # theta
  Xtemp = data.frame(x =r*sin(t) , y = r*cos(t)) 
  ytemp = data.frame(matrix(j, N, 1))
  X = rbind(X, Xtemp)
  y = rbind(y, ytemp)
  }
spiral_data = cbind(X,y)
colnames(spiral_data) = c(colnames(X), 'label')

# Want randomly ordered labels?
if (random_order==TRUE) {spiral_data$label = sample(1:3, size = N*K, 
                                                    replace = TRUE)}
return(spiral_data)
}

spiral_data = sim_data_fn(N=100) #100 obs per class, total of 300 samples
plot(spiral_data$x, spiral_data$y, col = spiral_data$label)

The Neural Network Functions

All of the functions used here were built from scratch using R’s base package. This is everything we need to perform a basic forward-pass, backward-pass, and parameter optimization.
Included are functions to:
- Initialize random parameters (weights & biases) for each layer (hidden and output layers)
- Create a densely connected layer with specified number of neurons.
- Apply ReLU or Softmax Activation Functions to each neuron.
- Calculate loss using categorical cross-entropy.
- Backpropogate each step in order to determine gradients that determine how to change every weight and bias.
- Apply the gradients to the current weights and biases using an optimizer (Stochastic Gradient Descent).

# Initialize Parameters for a Layer----
init_params = function(n_inputs = "# of features", n_neurons){
        
    weights = matrix(data = (0.1 * rnorm(n = n_inputs*n_neurons)),
                             nrow = n_inputs, ncol = n_neurons)
           #Number of weights = the number of inputs*number of neurons,
           #since every connection between the previous neurons (from input) and
           #the current neurons have an associated weight
    biases = matrix(data = 0, nrow = 1, ncol = n_neurons)
           #Number of biases = the number of neurons in a layer
    #saves:
    list("weights"=weights, "biases"=biases)
}


# Create a Dense Layer ----
layer_dense = list(
## FORWARD PASS 
 forward = function(    
            inputs, 
            n_neurons, 
            parameters) {#begin fn
   
  n_inputs = ncol(inputs)
      #determine number of inputs per sample from dims of the input matrix
      #should be equal to # of features (i.e, columns) in a sample (i.e, a row)
  
       #bias will have shape 1 by number of neurons. we initialize with zeros
 weights = parameters$weights
 biases = parameters$biases
 #Forward Pass 
 output = inputs%*%weights + biases[col(inputs%*%weights)]
 
 #SAVE:
 list("output" = output, #for forward pass
      "inputs" = inputs, "weights"= weights, "biases" = biases) #for backprop

 },
## BACKWARD PASS
 backward = function(inputs, weights, dvalues){
    #Gradients on parameters
    dweights = t(inputs)%*%dvalues
    dbiases = colSums(dvalues)
    #Gradient on values
    dinputs = dvalues%*%t(weights) 
    #save:
    list("dinputs"=dinputs,
         "dweights"=dweights,
         "dbiases"=dbiases)
 }
 
)
### Activation Functions ----
## ReLU
activation_ReLU = list(
  #FORWARD PASS
  forward = function(input_layer){
    
    output = matrix(sapply(X = input_layer, 
                    function(X){max(c(0, X))}
                    ), 
                  nrow = nrow(input_layer), ncol = ncol(input_layer))
    #ReLU function coerced into a matrix so the shape
    #is maintained (it will be equivalent to that of the input shape)
    
    #Function saves:
    list("output" = output, "inputs" = input_layer)
    #And prints
    #invisible(output)
  },
  #BACKWARD PASS
  backward = function(inputs, dvalues){
    dinputs = dvalues
    dinputs[inputs <= 0] = 0
    #save:
    list("dinputs"=dinputs)
  }
)

## SoftMax
activation_Softmax = list(
  #FORWARD PASS
  forward = function(inputs){
          #scale inputs
          max_value = apply(X = inputs, MARGIN = 2,  FUN = max)
          scaled_inputs = sapply(X = 1:ncol(inputs), 
                 FUN = function(X){
                    inputs[,X] - abs(max_value[X])})

          # exponetiate
          exp_values = exp(scaled_inputs)
          # normalize
          norm_base = matrix(rowSums(exp_values),
                             nrow = nrow(inputs), ncol = 1)
          probabilities = sapply(X = 1:nrow(inputs),
                          FUN = function(X){exp_values[X,]/norm_base[X,]}) 
          return(t(probabilities))
          #(transpose probabilities)
          },
  #BACKWARD PASS
  backward = function(softmax_output, dvalues){
    #*INCOMPLETE SECTION - don't use*
    #flatten output array
    flat_output = as.vector(softmax_output)
    
    #calculate jacobian matrix of output
    jacobian_matrix = diag(flat_output) - flat_output%*%t(flat_output)
    
    #calculate sample-wise gradient
    dinputs = jacobian_matrix%*%flat_dvalues
    
    
  }
)

### Loss ----
Categorical_CrossEntropy = list(
  #FORWARD PASS
  forward = function(y_pred = "softmax output", y_true = "targets"){
    
    #DETECT NUMBER OF SAMPLES
    samples = length(y_true)  

    #CLIP SAMPLES TO AVOID -Inf ERROR
    y_pred_clipped = ifelse(y_pred <= 1e-7, 1e-7, 
                        ifelse(y_pred >= (1-1e-7), (1-1e-7), y_pred))
    

    #DETERMINE IF Y_TRUE IS ONE-HOT-ENCODED AND SELECT CORRESPODNING CONFIDENCES
    confidences = ifelse(nrow(t(y_true)) == 1, 
      #if y_true is a single vector of labels (i.e, sparse), then confidences =
                    y_pred_clipped[cbind(1:samples, y_true)],
                      
                      ifelse(nrow(y_true) > 1,
                      #else, if y_true is one-hot encoded, then confidences =
                             rowSums(y_pred_clipped*y_true),
                             #else
                             "error indexing the predicted class confidences")
                  )
                    
    #CALC LOSS FOR EACH SAMPLE (ROW)
    neg_log_likelihoods = -log(confidences)
    return(neg_log_likelihoods)
    
  },
  #BACKWARD PASS
  backward = function(y_true, dvalues){
    #number of samples
    samples = length(dvalues)
    
    #number of labels
    labels = length(unique(dvalues[1,]))
    
    #if labels are sparse, turn them into one-hot encoded vector
    y_true = ifelse(#if
                    nrow(t(y_true)) ==1,
                    #one-hot-encode
                    y_true = do.call(rbind,
                                     lapply(X = y_true,
                                       function(X) as.integer(
                                          !is.na(match(unique(
                                          unlist(y_true)
                                                        ), X)
                                                ))
                                      )), 
                    #else
                    y_true)
    
    #calculate gradient
    dinputs = -y_true/dvalues
    #normalize gradient
    dinputs = dinputs/samples
    return(dinputs)
  }
)

#Softmax Activation X Cross-Entropy Loss Combination ----
# combine for faster backprop step (derivatives simplify nicely)
activation_loss_SoftmaxCrossEntropy = list(
  #FORWARD PASS
  forward = function(inputs, y_true){
    #output layer's activation function
    softmax_out = activation_Softmax$forward(inputs)
    #calculate loss
    loss = Categorical_CrossEntropy$forward(softmax_out, y_true)
    #function saves:
    list("softmax_output"=softmax_out, "loss"=loss) 
  },
  #BACKWARD PASS
  backward = function(dvalues, y_true){
    
    #Detect number of samples
    if (is.vector(dvalues)) {      #if one sample
      samples = 1
    } else if (is.array(dvalues)) {  #else if multiple samples
      samples = nrow(dvalues)
    } else print("error checking shape of inputs")
    
    #Reverse One-Hot Encoding
    #if labels are one-hot encoded, turn them discrete values
     ##helper function 
    anti_ohe = function(y_true){
               unique_classes = ncol(y_true)
               samples = nrow(y_true)
               y_true_vec = as.vector(y_true)
                    
               class_key = rep(1:unique_classes, each = samples)
               y_true = class_key[y_true_vec==1]
                    #selects the classes that correspond to 1s in y_true vector
                    return(y_true)
                    }
     ##check & modify
    y_true = if(is.array(y_true)){ #if one-hot encoded
                    #change to sparse
                    anti_ohe(y_true)
              } else y_true
    
    #Calculate gradient
     #Copy so we can modify
     dinputs = dvalues
     #Calculate gradient
     #index the prediction array with the sample number and its
     #true value index, subtracting 1 from these values. Requires discrete,
     #not one-hot encoded, true labels (explaining the need for the above step)
     dinputs[cbind(1:samples, y_true)] = dinputs[cbind(1:samples, y_true)] - 1
     #Normalize gradient
     dinputs = dinputs/samples
    #save desired outputs
    list("dinputs" = dinputs, "samples" = samples, "y_true" = y_true)
  }
)

# Optimizers----
## Basic Stochastic Gradient Descent (SGD)
optimizer_SGD = list(
  update_params = function(layer_forward, layer_backward, 
                           learning_rate = 1){ #default value
    #current params
    current_weights = layer_forward$weights
    current_biases = layer_forward$biases
    #gradients
    weight_gradients = layer_backward$dweights
    bias_gradients = layer_backward$dbiases
    #update
    weights = current_weights - learning_rate*weight_gradients
    biases = current_biases - learning_rate*bias_gradients
      #save:
      list("weights" = weights, "biases" = biases)
  }
)

Building a Network

Here I build a network to predict the classes (1, 2, or 3) of the spiral data based on the x and y coordinates, which represent the “features” in this dataset. Parameters are initialized randomly and then I use a for-loop to iterate through the forward and backward pass over 2500 epochs.
At the bottom of the loop, the optimizer applies the calculated gradients to the current weights and biases, and the layer parameters are written over with the newly calculated parameters.

set.seed(1) ##set seed for results replication
#Data: use x and y coordinates as features
spiral_X = spiral_data[,c("x","y")]
spiral_X = as.matrix(spiral_X) ##convert to matrix

#Initalize Weights & Biases Outside of Loop
l1_params = init_params(n_inputs = 2,      ## ncol(spiral_X) would also work
                         n_neurons = 64)   ## = to desired # neurons in layer 
l2_params = init_params(n_inputs = 64,     ## = to n_neurons in prior layer
                         n_neurons = 3)    ## = to desired # neurons in layer

# EPOCH LOOP
tot_epochs = 2500L

for (epoch in 1:tot_epochs) {
#Forward-pass
layer1 = layer_dense$forward(inputs = spiral_X, ##feed in data
                             n_neurons = 64,    ##specify desired neurons
                             parameters = l1_params) ##provide layer parameters
  layer1_relu = activation_ReLU$forward(input_layer = layer1$output)# #apply ReLU
      layer2 = layer_dense$forward(inputs = layer1_relu$output,##feed in ReLU output
                                   n_neurons = 3, ##3 neurons - 1 for each class
                                   parameters = l2_params) ##provide params
          output = 
          activation_loss_SoftmaxCrossEntropy$forward(inputs = layer2$output,
                                                  y_true = spiral_data$label)
          ##uses a combined softmax/crossentropy fn to apply a softmax activation function and calculate loss.
              #metrics:
              LOSS = output$loss ##extract calculate loss
              PRED = max.col(output$softmax_output, ties.method = "random")
                ##extract predictions based on category with highest predicted probability
              ACC = mean(PRED==spiral_data$label) ##calculate accuracy based on portion of correctly labeled samples

          #Backward-pass
          loss_softm_b = activation_loss_SoftmaxCrossEntropy$backward( ##backprop softmax fn
                                      dvalues = output$softmax_output, 
                                      y_true = spiral_data$label)
      l2_b = layer_dense$backward(inputs = layer2$inputs, ##backprop layer2, pass in softmax derivatives (chain rule)
                            weights = layer2$weights,
                            dvalues = loss_softm_b$dinputs)
  l1_relu_b = activation_ReLU$backward(inputs = layer1_relu$inputs, ##backprop layer1's ReLU, pass in layer2 derivatives
                                       dvalues = l2_b$dinputs)     
l1_b = layer_dense$backward(inputs = layer1$inputs, ##backprop layer1, pass in layer1 ReLU derivatives
                            weights = layer1$weights,
                            dvalues = l1_relu_b$dinputs)      

#Optimize parameters with learning rate of 0.5
l1_optim_params = optimizer_SGD$update_params(layer_forward = layer1,
                                       layer_backward = l1_b,
                                       learning_rate = 0.5)
                                       
l2_optim_params = optimizer_SGD$update_params(layer_forward = layer2,
                                       layer_backward = l2_b,
                                       learning_rate = 0.5)
#Update weights & biases with newly calculated params
l1_params = l1_optim_params
l2_params = l2_optim_params

  #loop repeats with new weights & biases...
#---
#Status Report:
  if (epoch == 1){print(c("epoch","loss","accuracy"))}
  if (epoch %in% seq(0,tot_epochs,by=100)){
        print(c( epoch, 
                 LOSS,
                 ACC) )}


#Save Final Metrics:
  if (epoch==tot_epochs){
             out = list("loss"=LOSS,
                         "accuracy"=ACC,
                         "predictions"=PRED)}
}#end loop

## [1] "epoch"    "loss"     "accuracy"
## [1] 100.0000000   1.1481168   0.4666667
## [1] 200.0000000   1.0666849   0.5633333
## [1] 300.0000000   0.8856428   0.6766667
## [1] 400.0000000   0.7638010   0.6966667
## [1] 500.0000000   0.6494212   0.7433333
## [1] 600.0000000   0.6265188   0.7633333
## [1] 700.0000000   1.3210822   0.8266667
## [1] 800.0000000   1.7219226   0.8533333
## [1] 900.000000   1.715416   0.840000
## [1] 1000.0000000    0.9714231    0.9066667
## [1] 1100.0000000    0.1680926    0.8733333
## [1] 1200.0000000    0.2213610    0.8433333
## [1] 1300.0000000    0.3538507    0.9333333
## [1] 1400.0000000    0.7161815    0.9766667
## [1] 1500.00000   10.46614    0.49000
## [1] 1600.0000000    1.9601171    0.9066667
## [1] 1700.0000000    1.3836203    0.9466667
## [1] 1800.0000000    1.1736429    0.9666667
## [1] 1900.0000000    0.7299922    0.9833333
## [1] 2000.0000000    0.6426703    0.9933333
## [1] 2100.0000000    0.5317172    0.9966667
## [1] 2200.0000000    0.4946968    0.9966667
## [1] 2300.0000000    0.5027338    0.9966667
## [1] 2400.0000000    0.4728830    0.9966667
## [1] 2500.0000000    0.4751791    0.9966667

The neural net is able to learn the classes quite effectively. As it minimizes loss, the accuracy of its predictions steadily increases. Of course, it is not necessarily a hard task for a model to memorize data. To truly test this network we would need to provide it with some test data, something I plan to do when my full neural net package is complete. For now, I am happy with what this program is capable of.

Final Metrics

#Loss:
out$loss

## [1] 0.4751791

#Accuracy:
out$accuracy

## [1] 0.9966667

#Predictions
cbind(predicted_label = out$predictions, actual_label = spiral_data$label)

##        predicted_label actual_label
##   [1,]               1            1
##   [2,]               1            1
##   [3,]               1            1
##   [4,]               1            1
##   [5,]               1            1
##   [6,]               1            1
##   [7,]               1            1
##   [8,]               1            1
##   [9,]               1            1
##  [10,]               1            1
##  [11,]               1            1
##  [12,]               1            1
##  [13,]               1            1
##  [14,]               1            1
##  [15,]               1            1
##  [16,]               1            1
##  [17,]               1            1
##  [18,]               1            1
##  [19,]               1            1
##  [20,]               1            1
##  [21,]               1            1
##  [22,]               1            1
##  [23,]               1            1
##  [24,]               1            1
##  [25,]               1            1
##  [26,]               1            1
##  [27,]               1            1
##  [28,]               1            1
##  [29,]               1            1
##  [30,]               1            1
##  [31,]               1            1
##  [32,]               1            1
##  [33,]               1            1
##  [34,]               1            1
##  [35,]               1            1
##  [36,]               1            1
##  [37,]               1            1
##  [38,]               1            1
##  [39,]               1            1
##  [40,]               1            1
##  [41,]               1            1
##  [42,]               1            1
##  [43,]               1            1
##  [44,]               1            1
##  [45,]               1            1
##  [46,]               1            1
##  [47,]               1            1
##  [48,]               1            1
##  [49,]               1            1
##  [50,]               1            1
##  [51,]               1            1
##  [52,]               1            1
##  [53,]               1            1
##  [54,]               1            1
##  [55,]               1            1
##  [56,]               1            1
##  [57,]               1            1
##  [58,]               1            1
##  [59,]               1            1
##  [60,]               1            1
##  [61,]               1            1
##  [62,]               1            1
##  [63,]               1            1
##  [64,]               1            1
##  [65,]               1            1
##  [66,]               1            1
##  [67,]               1            1
##  [68,]               1            1
##  [69,]               1            1
##  [70,]               1            1
##  [71,]               1            1
##  [72,]               1            1
##  [73,]               1            1
##  [74,]               1            1
##  [75,]               1            1
##  [76,]               1            1
##  [77,]               1            1
##  [78,]               1            1
##  [79,]               1            1
##  [80,]               1            1
##  [81,]               1            1
##  [82,]               1            1
##  [83,]               1            1
##  [84,]               1            1
##  [85,]               1            1
##  [86,]               1            1
##  [87,]               1            1
##  [88,]               1            1
##  [89,]               1            1
##  [90,]               1            1
##  [91,]               1            1
##  [92,]               1            1
##  [93,]               1            1
##  [94,]               1            1
##  [95,]               1            1
##  [96,]               1            1
##  [97,]               1            1
##  [98,]               1            1
##  [99,]               1            1
## [100,]               1            1
## [101,]               2            2
## [102,]               2            2
## [103,]               2            2
## [104,]               2            2
## [105,]               2            2
## [106,]               2            2
## [107,]               2            2
## [108,]               3            2
## [109,]               2            2
## [110,]               2            2
## [111,]               2            2
## [112,]               2            2
## [113,]               2            2
## [114,]               2            2
## [115,]               2            2
## [116,]               2            2
## [117,]               2            2
## [118,]               2            2
## [119,]               2            2
## [120,]               2            2
## [121,]               2            2
## [122,]               2            2
## [123,]               2            2
## [124,]               2            2
## [125,]               2            2
## [126,]               2            2
## [127,]               2            2
## [128,]               2            2
## [129,]               2            2
## [130,]               2            2
## [131,]               2            2
## [132,]               2            2
## [133,]               2            2
## [134,]               2            2
## [135,]               2            2
## [136,]               2            2
## [137,]               2            2
## [138,]               2            2
## [139,]               2            2
## [140,]               2            2
## [141,]               2            2
## [142,]               2            2
## [143,]               2            2
## [144,]               2            2
## [145,]               2            2
## [146,]               2            2
## [147,]               2            2
## [148,]               2            2
## [149,]               2            2
## [150,]               2            2
## [151,]               2            2
## [152,]               2            2
## [153,]               2            2
## [154,]               2            2
## [155,]               2            2
## [156,]               2            2
## [157,]               2            2
## [158,]               2            2
## [159,]               2            2
## [160,]               2            2
## [161,]               2            2
## [162,]               2            2
## [163,]               2            2
## [164,]               2            2
## [165,]               2            2
## [166,]               2            2
## [167,]               2            2
## [168,]               2            2
## [169,]               2            2
## [170,]               2            2
## [171,]               2            2
## [172,]               2            2
## [173,]               2            2
## [174,]               2            2
## [175,]               2            2
## [176,]               2            2
## [177,]               2            2
## [178,]               2            2
## [179,]               2            2
## [180,]               2            2
## [181,]               2            2
## [182,]               2            2
## [183,]               2            2
## [184,]               2            2
## [185,]               2            2
## [186,]               2            2
## [187,]               2            2
## [188,]               2            2
## [189,]               2            2
## [190,]               2            2
## [191,]               2            2
## [192,]               2            2
## [193,]               2            2
## [194,]               2            2
## [195,]               2            2
## [196,]               2            2
## [197,]               2            2
## [198,]               2            2
## [199,]               2            2
## [200,]               2            2
## [201,]               3            3
## [202,]               3            3
## [203,]               3            3
## [204,]               3            3
## [205,]               3            3
## [206,]               3            3
## [207,]               3            3
## [208,]               3            3
## [209,]               3            3
## [210,]               3            3
## [211,]               3            3
## [212,]               3            3
## [213,]               3            3
## [214,]               3            3
## [215,]               3            3
## [216,]               3            3
## [217,]               3            3
## [218,]               3            3
## [219,]               3            3
## [220,]               3            3
## [221,]               3            3
## [222,]               3            3
## [223,]               3            3
## [224,]               3            3
## [225,]               3            3
## [226,]               3            3
## [227,]               3            3
## [228,]               3            3
## [229,]               3            3
## [230,]               3            3
## [231,]               3            3
## [232,]               3            3
## [233,]               3            3
## [234,]               3            3
## [235,]               3            3
## [236,]               3            3
## [237,]               3            3
## [238,]               3            3
## [239,]               3            3
## [240,]               3            3
## [241,]               3            3
## [242,]               3            3
## [243,]               3            3
## [244,]               3            3
## [245,]               3            3
## [246,]               3            3
## [247,]               3            3
## [248,]               3            3
## [249,]               3            3
## [250,]               3            3
## [251,]               3            3
## [252,]               3            3
## [253,]               3            3
## [254,]               3            3
## [255,]               3            3
## [256,]               3            3
## [257,]               3            3
## [258,]               3            3
## [259,]               3            3
## [260,]               3            3
## [261,]               3            3
## [262,]               3            3
## [263,]               3            3
## [264,]               3            3
## [265,]               3            3
## [266,]               3            3
## [267,]               3            3
## [268,]               3            3
## [269,]               3            3
## [270,]               3            3
## [271,]               3            3
## [272,]               3            3
## [273,]               3            3
## [274,]               3            3
## [275,]               3            3
## [276,]               3            3
## [277,]               3            3
## [278,]               3            3
## [279,]               3            3
## [280,]               3            3
## [281,]               3            3
## [282,]               3            3
## [283,]               3            3
## [284,]               3            3
## [285,]               3            3
## [286,]               3            3
## [287,]               3            3
## [288,]               3            3
## [289,]               3            3
## [290,]               3            3
## [291,]               3            3
## [292,]               3            3
## [293,]               3            3
## [294,]               3            3
## [295,]               3            3
## [296,]               3            3
## [297,]               3            3
## [298,]               3            3
## [299,]               3            3
## [300,]               3            3

Hans Elliott

hanselliott61@gmail.com

Building a Neural Network in Base R

Hans Elliott

04-17-2022

The Task

The Neural Network Functions

Building a Network