Friday, February 14, 2014

Self Organizing Map (SOM)

Introduction

The Self-Organizing Map was developed by professor Kohonen. The SOM has been proven useful in many applications. It is one of the most popular neural network models. Self-Organizing Maps are aptly named. “Self-Organizing” is because no supervision is required. “Maps” is because they attempt to map their weights to conform to the given input data.

It belongs to the category of competitive and unsupervised learning networks, which means that no human intervention is needed during the learning. SOM can be used for clustering data without prior knowledge of the class memberships of the input data.

Architecture


 
Fig: Architecture of SOM network

It is probably the most useful neural net type, if the learning process of the human brain shall be simulated. The "heart" of this type is the feature map, a neuron layer where neurons are organizing themselves according to certain input values. The type of this neural net is both feed forward (input layer to feature map) and feedback (feature map). It is built of an input layer that’s neurons are connected with each neuron of another layer, called "feature map". The feature map can be one or two dimensional and each of its neurons is connected to all other neurons on the map. Mainly used for classification.

Basic Principle:

The basic idea of the SOM is to make available a certain amount of classificatory resources (we will call them neurons, although the term units is also widely used) which are organized based on the patterns available for classification (which will be called here input patterns). A very relevant characteristic of the SOM, which in fact constitutes a trademark, is the development of an ordered network in which nearby neurons will share similarities, this way patterns which are similar will activate similar areas in the SOM. Thus it can be said that different parts of the SOM will be activated by specific types of input patterns, leading to a representation based upon global organization and local specialization. The SOM is trained iteratively through a large number of epochs. An epoch is defined as the processing of all the input patterns once, thus each input pattern will be processed as many times as the number of epochs.

Variables
1.     Input data: X [MxN]. M number of data with N features.
2.     Maximum class: MC
3.     Weight: W [MCxN]
4.     Distance: DIST [1xN], a matrix of calculated distances for a single data (row).
5.     Learning rate: LR
6.      Minimum distance : MIND
7.     Minimum distance class: MINDC
8.     Best matching unit or winning neuron: BMU
9.     Epoch: E , number of iteration
10.             Output class: OUTC

 Algorithm & Pseudo Code

A.    Training

1.     Set learning rate (LR).
2.     Set maximum no class (MC).
3.     Initialize random weights (W) of dimension MCxN. N is the no of feature.
4.     Set epoch (E).
5.     Iterate through no of epoch(E) starts
a.     Iterate through no of input data start.
b.     Calculate distance from an input data[X(i)] to all the weights.
c.      Find the minimum distance (out of MC no of distances).
d.     Find the index for the minimum distance class[MINDC].
e.     BMU [W(MINDC,N)] is the weight that makes the minimum distance.
f.       Update the weight of BMU neuron using below formula,
g.     W(MINDC,N) = W(MINDC,N) + LR * (X (i,:)-W(MINDC,N))
h.     Iterate through no of input data ends.
6.     Iterate through no of epoch ends.
7.     Save the final weights.

B.    Testing

1.     Iterate through test data start [TEST(i)].
2.     Calculate distance from TEST(i) to all the final weights.
3.     Find the minimum distance (out of MC no of distances).
4.     Find the index for the minimum distance class[MINDC].
5.     Probable class is the MINDC for this data.
6.     Iterate through test data ends.

Example:

Training
Samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)

Initial weight matrix: (random values between 0 and 1)
Class1- .8 .4 .7 .3
Class2- .2 .6 .5 .9

Iteration one:

For i1:
 Distance1 = (1-0.8)^2+(1-0.4)^2+(0-0.7)^2+(0-0.3)^2 =0.98
Distance2 = (1-0.2)^2+(1-0.6)^2+(0-0.5)^2+(0-0.9)^2 = 1.86
Minimum distance = 0.98 with Distance1. So we will update weight for class1.
Class1 weights are
.8 > .8+(1-.8)*.6 =0.92
.4>.4+(1-.4)*.6=.76
.7>.7+(0-.7)*.6 = .28
.3>.3+(0-.3)*.6 = .12

New weights are:
0.92
0.76
0.28
0.12
0.2
0.6
0.5
0.9
                                            


For i2:
 Distance1 = (0-0.92)^2+(0-0.76)^2+(0-0.28)^2+(1-0.12)^2  =2.2768
Distance2 = (0-0.2)^2+(0-0.6)^2+(0-0.5)^2+(1-0.9)^2 = .62
Minimum distance = 0.62 with Distance2. So we will update weight for class2.
Class1 weights are
.2 > .2+(0-.2)*.6 =.08
.6>.6+(0-.6)*.6=.24
.5>.5+(0-.5)*.6 = .2
.9>.9+(1-.9)*.6 = .96

New weights are:
0.92
0.76
0.28
0.12
0.08
0.24
0.2
0.96




For i3:
 Distance1 = (1-0.92)^2+(0-0.76)^2+(0-0.28)^2+(0-0.12)^2  =.6768
Distance2 = (1-0.08)^2+(0-0.24)^2+(0-0.2)^2+(0-0.96)^2 = 1.8656

Minimum distance = 0.6768 with Distance1. So we will update weight for class1.
Class1 weights are
. 92> . 92+(1-.92)*.6 =.968
.76>.76+(0-.76)*.6=.304
.28>.28+(0-.28)*.6 = .112
.12>.12+(0-.12)*.6 = .048

New weights are:
0.968
0.304
0.112
0.048
0.08
0.24
0.2
0.96

For i4:
 Distance1 = (0-0.968)^2+(0-0.304)^2+(1-0.112)^2+(1-0.048)^2  =2.72
Distance2 = (0-0.08)^2+(0-0.24)^2+(1-0.2)^2+(1-0.96)^2 = .7056

Minimum distance = 0.6768 with Distance2. So we will update weight for class2.
Class2 weights are
. 08> . 08+(0-.08)*.6 =.032
.24>.24+(0-.24)*.6=.096
.2>.2+(1-.2)*.6 = .68
.96>.96+(1-.96)*.6 = .9844

New weights are:
0.968
0.304
0.112
0.048
0.032
0.096
0.68
0.984


Final weights:
After much iteration it reaches to final weight as:
Class1: 
1
0.5
0
0
Class2:
0
0
0.5
1


TESTING

With same data:
 Samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)

Test Data
Distance1
Distance2
Wining Class
For i1 :
1
1
0
0
0.25
3.25
Class1
For i2 :
0
0
0
1
2.25
0.25
Class2
For i3 :
1
0
0
0
0.25
2.25
Class1
For i4 :
0
0
1
1
3.25
0.25
Class2

With new data:
Data : (1, 1, 1, 0)
New Data
Distance1
Distance2
Wining Class
1
1
1
0
1.25
3.25
Class1




           










Thursday, February 13, 2014

Perceptron learning in Neural Network

Introduction
In ANN it is a supervised learning algorithm basically used for linearly separable classification problems. Here the system can learn concepts and can respond the output as TRUE (1) or FALSE (0) value.
Perceptron is a single layer network whose weight and bias can be trained to produce correct output when trained with input dataset. This training is called Perceptron Learning Rule.


The perceptron has a great ability to generalize from its training data and work with randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification.
Fig: Perceptron neural network




Training
Data from a training set are presented to the network one after another. If the network's output is correct, no change is made. Otherwise, the weights and biases are updated using the perceptron learning rule. An entire pass through all of the input training data is called an epoch. When such an entire pass of the training set has occurred without error, training is complete.
At this time any input training data may be presented to the network and it will respond with the correct output data. If a data P not in the training set is presented to the network, the network will tend to exhibit generalization by responding with an output similar to target vectors for input vectors close to the previously unseen input data P.

The Learning Rule

Let’s say,
Input data is: [X1, X2, X3, ----------------------XN].
Output data is: [Y1, Y2, Y3, ---------------------YN]. Where Yi= {0, 1}
Factor of input data: P is the no of factor in data [X11, X12, X13, ------------, X1P]
Weight is: W    [W1, W2, W3, ----------------------WP].
Bias is: B
Threshold is: TH
Learning rate is: LR

ALGORITHM

Set random weights to W of dimension 1xP;
Set Bias B of dimension 1x1;

Do {
Correct=0;
Epoch++;
   For I = 1 to N {
Activation = 0;
 For J = 1 to P
 {
 Activation = Activation + WJ*XIJ;
 }
Activation = Activation + B;
If Activation>TH
Result=1;
Else
Result=0;
If Result == YI
                        Correct++;
Else {
                For K = 0 to P
 {
 WK = WK + LR * (YI – Result) *XIK;
}
B=B+ LR * (YI – Result);
}
   }
If Epoch>1000
                Break;
} While (Correct<N)



Example: OR GATE FUNCTION

TH
LR
B=0










0.5
0.1
0










Input
output
Initial Weights


SUM
OUTPUT
IF(S>TH)
THEN 1
 ELSE 0
ERROR
CORRECTION
Final Weights
X1
X2
Y
W1
W2
C1 (X1*W1)
C2(X2*W2)
S(C1+C2)
N
E(Y-N)
C(R*E)
W1(W1+X1*C)
W2(W2+X2*C)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0.1
0
0.1
1
0
1
0
0.1
0
0
0
0
1
0.1
0.1
0.1
1
1
1
0.1
0.1
0.1
0.1
0.2
0
1
0.1
0.2
0.2
0
0
0
0.2
0.2
0
0
0
0
0
0
0.2
0.2
0
1
1
0.2
0.2
0
0.2
0.2
0
1
0.1
0.2
0.3
1
0
1
0.2
0.3
0.2
0
0.2
0
1
0.1
0.3
0.3
1
1
1
0.3
0.3
0.3
0.3
0.6
1
0
0
0.3
0.3
0
0
0
0.3
0.3
0
0
0
0
0
0
0.3
0.3
0
1
1
0.3
0.3
0
0.3
0.3
0
1
0.1
0.3
0.4
1
0
1
0.3
0.4
0.3
0
0.3
0
1
0.1
0.4
0.4
1
1
1
0.4
0.4
0.4
0.4
0.8
1
0
0
0.4
0.4
0
0
0
0.4
0.4
0
0
0
0
0
0
0.4
0.4
0
1
1
0.4
0.4
0
0.4
0.4
0
1
0.1
0.4
0.5
1
0
1
0.4
0.5
0.4
0
0.4
0
1
0.1
0.5
0.5
1
1
1
0.5
0.5
0.5
0.5
1
1
0
0
0.5
0.5
0
0
0
0.5
0.5
0
0
0
0
0
0
0.5
0.5
0
1
1
0.5
0.5
0
0.5
0.5
0
1
0.1
0.5
0.6
1
0
1
0.5
0.6
0.5
0
0.5
0
1
0.1
0.6
0.6
1
1
1
0.6
0.6
0.6
0.6
1.2
1
0
0
0.6
0.6
0
0
0
0.6
0.6
0
0
0
0
0
0
0.6
0.6
0
1
1
0.6
0.6
0
0.6
0.6
1
0
0
0.6
0.6
1
0
1
0.6
0.6
0.6
0
0.6
1
0
0
0.6
0.6
1
1
1
0.6
0.6
0.6
0.6
1.2
1
0
0
0.6
0.6
0
0
0
0.6
0.6
0
0
0
0
0
0
0.6
0.6
0
1
1
0.6
0.6
0
0.6
0.6
1
0
0
0.6
0.6
1
0
1
0.6
0.6
0.6
0
0.6
1
0
0
0.6
0.6
1
1
1
0.6
0.6
0.6
0.6
1.2
1
0
0
0.6
0.6






















Limitations
Perceptron networks have several limitations.
1.       First, the output values of a perceptron can take on only one of two values (True or False).
2.       Second, perceptrons can only classify linearly separable sets of vectors. If a straight line or plane can be drawn to separate the input vectors into their correct categories, the input vectors are linearly separable and the perceptron will find the solution.

3.        If the vectors are not linearly separable learning will never reach a point where all vectors are classified properly.


MULTICLASS PERCEPTRON NETWORKS:
Let be X = {x1X2… xn} the training data and the objects belong to C classes, c(x) = {1 to C}. Each class has its own linear classifier: W and B.
The classification rule becomes,
c(x) = argmax (wi,xi) for all i=1 to C
Using compact notation
Where wi0 = (wi1,………….,wid, bi)
Where x0 = (x1,……….,xd,1).

MULTI-CLASS PERCEPTRON ALGORITHM
Let’s say
Input data X= {X1, X2…, Xn}
Output data Y= {Y1, Y2…, Yn} where Y belongs to Y(X) = {1 to C}.
Weight W = {W1, W2, ----------, WC}
Bias B = {B1, B2, ---------, BC}


For all x in X
 I = c(x)
g = Wi * x;
error=false
for j = 1 ……C, j not equal to I,
 if g < wjx then
error=true
wj = wj-LR* x 
end for
if error then wi = wi + LR*x
end for all

In case of miss-classification the algorithm modifies the w associated to the correct class and the w associated to current selected class whose wjx is higher than the value for the correct class.