Evolutionary Computation and Artificial Intelligence: Optimizing Neural Networks for Image Classification Using a Genetic Algorithm (with code)

Overview:

This blog presents an implementation of a Genetic Algorithm (GA) to optimise the hyperparameters (design choices that dictate the shape, form and other internals) of an Artificial Neural Network (ANN) for classifying the MNIST dataset. ANNs come in a variety of shapes and forms and choosing the right kind of ANN requires their deep understanding and can still be challenging. Instead of manually designing the ANN, the goal here is to automatically evolve a neural network that can accurately predict the correct numeric digit in each of the images in the MNIST dataset – a set of 70,000 images of handwritten digits, where each image is a 28×28 pixel grayscale image. The ANN is built using the Keras library, and the GA is implemented using the GA library in R.

Artificial Neural Networks (ANNs)

ANNs represent a popular approach to machine learning; they are inspired by the structure of the human brain and are often used to mimic tasks such as vision in a computational sense. In this case, the task is recognising hand written numerical characters (digits) given in a set of images. An ANN consists of layers of interconnected nodes (neurons), where each neuron receives inputs, performs a computation, and outputs a result that may input into subsequent neurons or generate the final outcome, that is, a decision as to which digit exists in a given image. Essentially, then each neuron is a computational unit, which encodes a carefully selected mathematical function. A series of interconnected neurons allows the ANN the flexibility to encode complex computational ability required for tasks such as computer vision.

The ANN is trained using a set of labeled data (MNIST dataset in this case); during training the ANN tunes its internal parameters called weights and biases to minimise the error between the predicted outputs (predicted digit) and the actual outputs (actual digit existing in the image).

Evolving Artificial Neural Networks (Neuroevolution)

The problem with ANNs is that there are many hyperparameters that need to be tuned to achieve optimal performance.Hyperparameters of an Artificial Neural Network (ANN) are like the settings of a musical instrument that can be adjusted to create the best sound. They are the design choices that define how a neural network learns and operates, and they can have a significant impact on the performance of the network.

For example, one important hyperparameter is the number of neurons or nodes in each layer (neurons are arranged in a series of layers in an ANN), which affects the network’s ability to extract useful features from the input data. Another hyperparameter is the learning rate, which controls how quickly the network tunes its weights during training, which can impact the speed and accuracy of the learning process.

The code given later automatically optimises the two aforementioned hyperparameters. However, it can be easily extended to tune other hyperparameters as well. For example, we can also tune the number of layers in the neural network, which determines how complex the network can be in processing the input data. Similarly, we can optimise the activation function (mathematical function encoded in the neurons) that determines how a neuron should be activated based on its input.

Without an automatic optimisation procedure, choosing the right hyperparameters for a neural network requires a deep understanding of the problem domain and the data being used, as well as a thorough knowledge of the network architecture and the various hyperparameters that can be adjusted to fine-tune its performance. This can be a time-consuming and difficult process.

By using a Genetic Algorithm (GA), we can automate the process of optimising the hyperparameters and evolve the neural network to achieve better performance. This process is also called Neuroevolution or Neural Architecture Search.

Scope of Optimisation:

In the given code, an ANN with 2 hidden layers is used. Therefore, the hyperparameters being optimised are:

number of neurons in the first hidden layer;
number of neurons in the second hidden layer;
and the learning rate of the optimiser used by the ANN to tune its internal parameters.

Choice of Genetic Algorithm (GA) as an Optimiser:

A genetic algorithm simulates biological evolution to solve computational problems. It represents an iterative process that produces and improves potential solutions to a problem. The algorithm starts by creating a random set of solutions, known as the population. Each solution in the population is evaluated using a fitness function, which measures how well a solution solves the problem. The solutions are then modified using two genetic operators: crossover and mutation. Crossover recombines two solutions to create a new solution that contains some of the characteristics of both parents. Mutation randomly changes some of the characteristics of a solution to create a new, slightly different solution.

After applying the genetic operators, the new set of solutions is evaluated using the fitness function, and the better performing solutions are selected to form the next generation of the population. This process is repeated for a number of generations until a satisfactory solution is found or a maximum number of generations has elapsed.

A GA is a good choice for optimising the hyperparameters of an ANN because it can search a large search space efficiently. The efficiency results because the GA scopes out better performing combinations of hyperparameters without exhaustively or randomly trying all the possibilities. Instead, it uses what can be called guided random search where better performing solutions within the overall search space of hyperparameters are selected probabilistically.

R Code:

The code below is organised as follows:

Load the necessary R packages;
Set up the dataset;
Set up the a function that trains the ANN with the given hyperparameters as specified by the GA;
Define the fitness function that calls the ANN training function as above with a given set of hyperparameters and returns the accuracy of the trained ANN as its fitness value.
Configure the GA and call the GA to start the optimisation process.

Now we detail the code below.

Load packages: The first step is to load the necessary packages, which are GA, and keras,. These packages provide the tools needed to perform genetic algorithm optimisation, and build and train neural networks.
- Note: if they are not installed in your R environment already, you will first need to install these packages by calling: install.packages("GA"); install.packages("keras")

12	`library(GA)library(keras)`

Load and Normalise the MNIST dataset: Next, the MNIST dataset is loaded using the dataset_mnist() function from the keras library. The images and labels from the training and test sets are stored in separate variables.
The pixel values of the images are normalised by dividing them by 255. This scales the pixel values between 0 and 1, which helps in the training process.

123456789 mnist <- dataset_mnist()train_images <- mnist$train$xtrain_labels <- mnist$train$ytest_images <- mnist$test$xtest_labels <- mnist$test$y# Normalize the imagestrain_images <- train_images / 255test_images <- test_images / 255

Record the best ANN discovered: A global variable is declared to keep track of the best ANN model discovered during the GA optimisation phase. This can be used at the end of the optimisation to recover the model and the relevant statistics concerning its training process.

# A global variable to track the best evolved model
bestModel <- NULL

Train the ANN:

12345678910111213141516171819202122232425262728293031323334353637383940414243 # A function that trains a neural network using given hyperparameterstrain_nn <- function(hyperparameters) { # Define hyperparameters layer1_size <- round(hyperparameters[1]) layer2_size <- round(hyperparameters[2]) l_rate <- hyperparameters[3] # Build the neural network model <- keras_model_sequential() %>% layer_flatten(input_shape = c(28, 28)) %>% layer_dense(units = layer1_size, activation = 'relu') %>% layer_dense(units = layer2_size, activation = 'relu') %>% layer_dense(units = 10, activation = 'softmax') # Configure the model before training begins. Therefore: # - Optimise the given loss function during training. # - Also, record accuracy on training and validation data. model %>% compile( loss = 'sparse_categorical_crossentropy', optimizer = optimizer_adam(learning_rate = l_rate), metrics = c('accuracy') ) # Train the model history <- model %>% fit( train_images, train_labels, epochs = 5, batch_size = 128, validation_split = 0.2, verbose = 0 ) #Update the best evolved model if (is.null(bestModel) || history$metrics$val_accuracy[5] > bestModel$history$metrics$val_accuracy[5] ) bestModel <<- list(model = model, history = history) # Return the validation accuracy return (history$metrics$val_accuracy[5])}

The above code defines a function train_nn that trains a neural network using the given hyperparameters. The hyperparameters include the sizes of two hidden layers of the neural network (layer1_size and layer2_size) and the learning rate (l_rate). The function first defines the architecture of the neural network using the keras_model_sequential() function from the keras package. The network has: an input layer that takes an image of size 28×28; a flatten layer that flattens the 2-dimnesional input image into a 1-dimensional vector; two dense layers (where each neuron in the layer is connected to every neuron in the preceding layer) with the specified sizes and activation functions (‘relu’ – rectified linear unit); and an output layer with 10 units (corresponding to each of the 10 digits); and a ‘softmax’ activation function that produces a probability of finding each of the 10 digits in a given image.

After defining the network, the function compiles the model with the specified loss function (sparse_categorical_crossentropy), optimiser (optimizer_adam with the specified learning rate), and evaluation metric (accuracy); compiling means the model is configured with these settings before the training begins. Then, the function trains the model on the training set (train_images and train_labels) with the specified hyperparameters using the fit() function. The training is performed for five epochs with a batch size of 128 images and a validation split of 0.2 (20% of data is only used to indicate overfitting). The verbose argument is set to 0, which means that no output will be displayed during training.

Finally, the function updates the best model using the validation accuracy. If the bestModel variable is null (no ANN model has been trained previously) or the validation accuracy of the current model is better than the previous best model, the bestModel variable is updated with the current model and its training history. The function returns the validation accuracy of the model after the fifth epoch.

Set up the Genetic Algorithm. Finally, the code below configures a Genetic Algorithm available from the GA package in R.

12345678910111213141516171819 # Define the GA parameterspopulation_size <- 10number_of_generations <- 3mutation_probability <- 0.1crossover_probability <- 0.8# Define the range of hyperparameters to optimizehyperparameters_range <- matrix(c(16, 128, 16,128, 0.001, 0.1), ncol = 3, byrow = FALSE)# Define the fitness function for the GAfitness_function <- function(x) { accuracy <- train_nn(x) return (accuracy)}# Run the GA optimisationresult <- ga(type = "real-valued", fitness = fitness_function, lower = hyperparameters_range[1,], upper = hyperparameters_range[2,], popSize = population_size, maxiter = number_of_generations, pmutation = mutation_probability, pcrossover = crossover_probability, seed = 1)

This code is performing a hyperparameter optimization using a genetic algorithm (GA). The GA parameters are defined as follows:

population_size: the number of individuals (sets of hyperparameters) in each generation.
number_of_generations: the maximum number of generations for the GA to run.
mutation_probability: the probability of a parameter being randomly mutated in an individual during reproduction.
crossover_probability: the probability of crossing over two selected individuals to create an offspring.

The range of hyperparameters to optimise is defined as a matrix hyperparameters_range, which specifies the lower and upper bounds for each hyperparameter to be optimised. In this case, the matrix has 2 rows corresponding to the lower and upper bounds for the set of hyperparameters and 3 columns corresponding to the 3 hyperparameters. The hyperparameters being optimised are (x[] represents a row):

x[1]: the number of units in the first dense layer.
x[2]: the number of units in the second dense layer.
x[3]: the learning rate for the optimiser.

The fitness_function is defined as a function of the hyperparameters x. It trains a neural network with the given hyperparameters and returns the accuracy of the model on a validation set. This accuracy is the fitness value used by the GA to evaluate each individual.

The GA optimisation is performed using the ga function from the GA package in R. The type argument specifies that the hyperparameters are real-valued. The fitness argument specifies the fitness function to be used. The lower and upper arguments specify the lower and upper bounds of the hyperparameters, respectively. The remaining parameters are self-explanatory; however, the seed argument may need to be explained. Since a GA is a stochastic optimisation method, a random seed must be stored to be able to reproduce a given execution, or a run, of a GA. The result of the GA optimisation is stored in the result variable.

Results

1	`plot(result)`

The above code plots the result as below:

The result of running the Genetic Algorithms over 3 generations only. We can set the maximum number of generations to a higher value to generate extended results. Fitness value represents the validation accuracy. The best, mean and median performance of the population members is plotted.

The results above show the evolution of hyperparameters over 3 generations. Note, the number of generations is kept only 3 to minimise the computational expense. Even with such a modest setting, since the population size is 10, during this GA run 30 ANN models were trained over 60,000 images. Thus we have sampled 30 different architectures.

We can retrieve the performance of the best model and evaluate it on the test data as below.

123456789 bestModel$model %>% evaluate(test_images, test_labels, verbose = 1)#Output:313/313 [==============================] - 0s 1ms/step - loss: 0.1315 - accuracy: 0.9603 loss accuracy0.1315127 0.9603000

Finally, we can plot the training history of the best model with the command plot(bestModel$history); this means, as in the figure below, we can see how the loss function and accuracy on both the training and validation data improve as the training epochs went on for this particular model.

The top half of the figure shows the optimisation (decrease) of the loss function; it plots the loss values on both the training and validation sets. The bottom half demonstrates the increase in accuracy over the same period across both the training and validation sets.

Summary

In summary we have presented a sample R code that optimises a neural network architecture for image classification. Even though we have used modest computational settings (population size of 10; maximum generations = 3), it still amounted to trying out 30 ANNs, which is a significant step up from hedging your bets with the first setting that comes into your mind. If we have the computational budget and patience, we can easily increase these settings to evolve a larger population of ANNs for a longer duration.

What we have not discussed here is that we can also “seed” the GA with hyperparameter sets that work well and let the GA try to improve upon it further. However, more on that later.

Dr. Atif Azad

Evolutionary Computation and Artificial Intelligence: