Neural Networks ---------------- Artificial Intelligence Developed by Hardeep Singh. Copyright: Hardeep Singh, 2001. EMail : seeingwithc@hotmail.com Website : seeingwithc.org All rights reserved. The code may not be used commercially without permission. The code does not come with any warranties, explicit or implied. The code cannot be distributed without this header. Problem: In all other discussions on this site, I first discuss a practical problem and then suggest alternative solutions to it. However, in this write-up, I diverge from the trend. Here, I first suggest an Artificial Intelligence technique, which is modeled on the way humans learn to do things, and then suggest uses for it. We will try to make a program automatically learn a boolean function, the way our brain learns things. Solution: Take for example, the task of learning how to play the piano. When we try to for the first time, we fare badly. However, we stick to and each hour of practice we put in, takes us closer to our goal, until we can play it well. Scientists trying to understand the working of human brain think our brain has networks of Neurons (nerve cells). Each neuron can conduct (the signal) or not conduct depending on the input, its weights and a threshold value. In scientific terms, neurons fire or not depending on the summed strength of their inputs across synapses of various strengths. Initially, the neural network (in our brain) has random weights and thus does not do the given task well. As we practice, the brain keeps adjusting the weights and the threshold values of neurons in the network. After a while, when the weights are adjusted, we call the neural network, trained and we can do the task well. We model a neuron through a simpler representation called a Threshold Logic Unit (TLU). A schematic diagram of a TLU is shown below: [Look at http://www.seeingwithc.org/solu1t5_1.gif] As can be seen from the diagram, a TLU has various inputs X1,X2,...,Xn. These inputs are multiplied with the corresponding weights and added together. If this sum is greater than the threshold value, the output is a high(1). Otherwise, the result is low(0). Here is a TLU implementing a boolean function [Look at http://www.seeingwithc.org/solu1t5_2.gif] If the inputs to this TLU are X1 = 1, X2 = 1 and X3 = 1. The sum of weighted inputs is, X1W1 + X2W2 + X3W3 = 1(1)+1(-1)+1(1) = 1. Since this is less than the threshold value of 1.5, the output is a 0, as expected. However, if the inputs to this TLU are X1 = 1, X2 = 0 and X3 = 1. The sum of weighted inputs is, 1(1)+1(0)+1(1)=2. This is greater than the threshold value of 1.5, and now the output is 1. Although a single TLU is sufficient to implement simple boolean functions (functions that are monomials or disjunction of literals), not all functions can be implemented in this way. Examples are the XOR and the even parity function. For such functions, a network of TLUs is needed. Such a network of TLUs is called a Neural Network. A neural net implementation for the even parity function is given below: [Look at http://www.seeingwithc.org/solu1t5_3.gif] The intermediate TLUs (those with thresholds 1.5 and -0.5) are said to constitute the hidden layer. TLUs with one hidden layer can learn any boolean function (although the actual number of TLUs in this layer may need to be increased with increase in number of variables). For more specialised tasks, neural nets may have more hidden layers. However, we will not concern ourselves with such nets here, although the general procedure for training them is the same. To start with, the weights in the TLU and the threshold value are randomly decided. Then, the TLU is presented the expected output for a particular input. For the given input, the output of the TLU is also noted. Usually, because the weights are random, the TLU responds in error. This error is used to adjust weights so that the TLU produces the required output for the given input. Similarly, all the expected values in the training set are used to adjust the weights. Once the TLU is trained, it will respond correctly for all inputs in the training set. Also, now that the TLU is trained, we can use it to calculate the output for inputs not in the training set. Now the next question that arises is, How do we adjust the TLU weights for the sample input? Although many methods exist for this purpose, we will be using the Generalized Delta Procedure. Under this procedure, the threshold function is temporarily (for the purpose of training) replaced by a sigmoid (S shaped) function. The function we use is 1/(1+e**(-x)), where ** stands for exponentiation. [Look at http://www.seeingwithc.org/solu1t5_5.gif] This has the following graph (for x = -5 to 5): [Look at http://www.seeingwithc.org/solu1t5_4.gif] As can be seen, this is not much different from the threshold function. This function has the benefit of being differentiable. This property is used to find the weight change rule, as given below. After the result from the TLU does not match the desired result from the training set, the weights are adjusted according to the formula: W <== W + c(d-f)f(1-f)X Where, W is the weight. c is a constant of learning. We take c to be universally 1. d is the desired result from the training set (0 or 1). f is the actual result from the TLU, with the threshold function replaced by the sigmoid function. X is the input. Since the threshold function also needs to be adjusted during training, we use the concept of Augmented Vectors. For this, we add a new input to the TLU, Xn+1 and a new weight for this input, Wn+1. The input Xn+1 remains at 1 and the weight Wn+1 is adjusted as usual. The threshold value is fixed at 0. After the training, we discard the additional input and set threshold = the new, adjusted value of Wn+1. Convince yourself that the new, augmented TLU gives the same result as the other. /* Program to train a 2 layer feedforward neural network to learn any boolean function */ /* Copyright: Hardeep Singh, 2001. All rights reserved. Email: seeingwithc@hotmail.com */ /* An example: (with 4 variables) function=x'y'z'+x'yz+xy'z+xyz Enter Training Sample: 000 1, 010 0, 011 1, 100 0, 110 1 Returns correct values for the rest: 001, 101, 111 */ #include #include #include #include #include "myown.h" //contains some simple I/O functions. you can write them yourself //or contact me #define MAXNVAR 25 //maximum number of allowed variables; should be TLU_MULTIPLIER*Actual maximum #define TLU_MULTIPLIER 1 //defines the number of hidden layer TLUs per variable. actually should be //2^nVar but less will do. can be increased to 100 //this later appears to be redundant, 1 is sufficient #define MAXINPUT 100 //maximum number of training samples input #define MAXLOOP 30000 //maximum number of times a NNet can be trained on a given input #define C 1 //constant of learning. 1 is ok. better if 0.1 or 0.01 (my ideas) #define EPSILON 0.000001 void checkAlloc(void *v) { if (!v) { printf("Unable to initialise Neural Net: Memory problem."); exit(1); } } class TLU { private: double weights[MAXNVAR]; int nVar; //number of variables double thresh; //threshold value: ALWAYS 0 public: TLU(); void setWeight(int i,double value); //i starts at 0 double getWeight(int i); double getThresh(); double evSigmoid(double x[MAXNVAR]); //evaluates using sigmoid function int evThresh(double x[MAXNVAR]); //evaluates using threshold function void setnVar(int i); int getnVar(); int doTrain(int x[MAXNVAR],double d) {return 0;} //stubbed because not required //in current problem //returns a boolean saying if any weight was changed during training }; TLU::TLU() { thresh=nVar=0; for (int i=0;ithresh?1:0; } void TLU::setnVar(int i) { nVar=i; } int TLU::getnVar() { return nVar; } class NNet { //neural net specific to the problem private: TLU *hidden; TLU final; //this TLU has TLU_MULTIPLIER*nVar+1 variables, //one is for threshold function (augmentation) int nVar; double doTrainEv(double x[MAXNVAR]); public: NNet(int nVar); //no default constructor ~NNet(); int doTrain(double x[MAXNVAR],double d); int doEvaluate(double x[MAXNVAR]); TLU getHiddenTLU(int i); TLU getFinalTLU(); }; NNet::NNet(int nVar) { this->nVar=nVar; hidden=new TLU[TLU_MULTIPLIER*nVar]; checkAlloc(hidden); for (int i=0;iEPSILON) { f=doTrainEv(x); ret=true; double *fhid=new double[TLU_MULTIPLIER*nVar]; checkAlloc(fhid); for (int i=0;i"); readstr(str,254); if (str[0]==0) break; for (int i=0;i"); readstr(str,254); if (str[0]==0) break; for (int i=0;i