Implementation of Artificial Neural Network for XOR Logic Gate with 2-bit Binary Input

Each connection (similar to a synapse) between artificial neurons can transmit a signal from one to the other. The artificial neuron receiving the signal can process it and then signal to the artificial neurons attached to it. Now that we have defined everything we need, we’ll create a training function. As an input, we will pass model, criterion, optimizer, X, y and a number of iterations. Then, we will create a list where we will store the loss for each epoch. We will create a for loop that will iterate for each epoch in the range of iteration.

The XOR function

He is a Quality Analyst by profession and have 15 years of experience. SGD works well for shallow networks and for our XOR example we can use sgd. The selection of suitable optimization strategy is a matter of experience, personal liking and comparison.

Elements from Deep Learning Pills #1

Though, the idea of increasing the learning rate for the scaling factor is worth overcoming the vanishing gradient problem in higher dimensional input. However, an optimized value of the learning rate is not suggested in the previous πt-neuron model. Also, it is difficult to adjust the appropriate learning rate or range of initialization of scaling factors for variable input dimensions. Therefore, a generalized solution is still required to solve these issues of the previous model. In this paper, we have suggested a generalized model for solving the XOR and higher-order parity problems by enhancing the pt-neuron model. The products of the input layer values and their respective weights are parsed as input to the non-bias units in the hidden layer.

1.1. Generate Fake Data¶

Selection of a loss and cost functions depends on the kind of output we are targeting. In Keras we have binary cross entropy cost funtion for binary classification and categorical cross entropy function for multi class classification. In our X-OR problem, output is either 0 or 1 for each input sample. We will use binary cross entropy along with sigmoid activation function at output layer.[Ref image 6]. Backpropagation is a supervised learning algorithm used to train neural networks.

Activation Functions!

Though, PU has shown the capability for N-bit parity problems. However, it has issues in training with the standard backpropagation (BP) algorithm especially for higher-order inputs (more than three-dimensional input) [20]. According to Leerink et al., it is because of nonglobal minima trapping in the case of higher dimensional inputs [20]. Later, in 2004, Iyoda et al. proposed a single neuron based on a multiplicative neuron model, aka πt-neuron model, to solve the XOR and parity bit problems [16, 17]. They have modified the previous multiplicative π-neuron model to find a suitable tessellation decision surface. Therefore, the possibilities of nonconvergence/nonglobal minima problems occur in the previous πt-neuron model.

Our goal is to find the weight vector corresponding to the point where the error is minimum i.e. the minima of the error gradient. I recommend you to play with the parameters to see how many iterations it needs to achieve the 100% accuracy rate. There are many combinations of the parameters settings so is really up to your experience and the classic examples you can find in “must read” books. And, in my case, in iteration number 107 the accuracy rate increases to 75%, 3 out of 4, and in iteration number 169 it produces almost 100% correct results and it keeps like that ‘till the end. As it starts with random weights the iterations in your computer would probably be slightly different but at the end, you’ll achieve the binary precision, which is 0 or 1.

Therefore, previous researchers suggested using a multiplicative neuron model for solving XOR and similar problems. For the system to generalize over input space and to make it capable of predicting accurately for new use cases, we require to train the model with available inputs. During training, we predict the output of model for different inputs and compare the predicted output with actual output in our training set. The difference in actual and predicted output is termed as loss over that input. The summation of losses across all inputs is termed as cost function.

The further $x$ goes in the negative direction, the closer it gets to 0. However, it doesn’t ever touch 0 or 1, which is important to remember. We will define prediction y_hat and we will calculate the loss which will be equal to criterion of the y_hat and the original y. Then we will store loss inside this all_loss list that we have created.

  1. The architecture used here is designed specifically for the XOR problem.
  2. However, this model also has a similar issue in training for higher-order inputs.
  3. On the right-hand side, however, we can see a convex set where the line segment joining the two points from \(S\) lies completely inside the set \(S\).
  4. In this article, we will explore how neural networks can solve this problem and provide a better understanding of their capabilities.
  5. The classic multiplication algorithm will have complexity as O(n3).

For the XOR gate, the truth table on the left side of the image below depicts that if there are two complement inputs, only then the output will be 1. If the input is the same(0,0 or 1,1), then the output xor neural network will be 0. The points when plotted in the x-y plane on the right gives us the information that they are not linearly separable like in the case of OR and AND gates(at least in two dimensions).