Friday, January 2, 2015

Probabilistic Neural Network



Probabilistic Neural Network
Consider the problem of multi-class classification. We are given a set of data points from each class. The objective is to classify any new data sample into one of the classes. Consider the problem of multi-class classification. We are given a set of data points from each class. The objective is to classify any new data sample into one of the classes.
Probabilistic Neural Network or, PNN can be useful for multi-class classifier.
Architecture
A PNN is an implementation of a statistical algorithm called kernel discriminant analysis in which the operations are organized into a multilayered feedforward network with four layers.
1)    Input layer
The input layer contains the nodes with set of measurements. Each neuron in the input layer represents a predictor variable. In categorical variables, N-1 neurons are used when there are N number of categories. It standardizes the range of the values by subtracting the median and dividing by the interquartile range. Then the input neurons feed the values to each of the neurons in the hidden layer.
2)    Pattern layer
The pattern layer consists of the Gaussian functions formed using the given set of data points as centers. This layer contains one neuron for each case in the training data set. It stores the values of the predictor variables for the case along with the target value. A hidden neuron computes the Euclidean distance of the test case from the neuron’s center point and then applies the RBF kernel function using the sigma values.
3)    Summation layer
The summation layer performs a sum operation of the outputs from the second layer for each class.
4)    Output layer
The output layer performs a vote, selecting the largest value. The associated class label is then determined.


Advantages
There are several advantages and disadvantages using PNN.
·         PNNs are much faster than multilayer perceptron networks.
·         PNNs approach Bayes optimal classification.
·         Guaranteed to converge to an optimal classifier as the size of the representative training set increases
·         An inherently parallel structure
·         PNN networks are relatively insensitive to outliers.
·         PNNs can be more accurate than multilayer perceptron networks.
·         PNN networks generate accurate predicted target probability scores.
·          
Disadvantages
·         PNN are slower than multilayer perceptron networks at classifying new cases.
·         PNN require more memory space to store the model.
·         Requires a representative training set

 PNN Pseudo Code
// C is the number of classes, N is the number of examples, Nk are from class k
// d is the dimensionality of the training examples, sigma is the smoothing factor
// test_example[d] is the example to be classified
// Examples[N][d] are the training examples
int PNN(int C, int N, int d, float sigma, float test_example[d], float Examples[N][d])
{
int classify = -1;
float largest = 0;
float sum[ C ];
// The OUTPUT layer which computes the pdf for each class C
for ( int k=1; k<=C; k++ )
{
sum[ k ] = 0;
// The SUMMATION layer which accumulates the pdf
// for each example from the particular class k
for ( int i=0; i<Nk; i++ )
{
float product = 0;
// The PATTERN layer that multiplies the test example by the weights
for ( int j=0; j<d; j++ )
product += test_example[j] * Examples[i][j];
product = ( product – 1 ) / ( sigma * sigma );
product = exp( product );
sum[ k ] += product;
}
sum[ k ] /= Nk;
}
for ( int k=1; k<=C; k++ )
if ( sum[ k ] > largest )
{
largest = sum[ k ];
classify = k;
}
return classify;
}
Example
Input Data Set:
X
Y
CLASS
1
0
1
0
1
1
1
1
1
-1
0
2
0
-1
2

Test data: [0.5, 0.5]
PNN:
X
Y
CLASS
Count-1
Count-2
1
0
1
3
2
0
1
1


1
1
1


-1
0
2


0
-1
2


X1
X2
0.5
0.5




X
Y
X-X1
X-X2
(X-X1)^2
(X-X2)^2
exp(-((X-X1)^2)/2)
exp(-((X-X2)^2)/2)
exp(-(X-X1)^2/2)+exp(-((X-X2)^2)/2)
1
0
0.5
-0.5
0.25
0.25
0.778800783
0.778800783
1.557601566
0
1
-0.5
0.5
0.25
0.25
0.778800783
0.778800783
1.557601566
1
1
0.5
0.5
0.25
0.25
0.778800783
0.778800783
1.557601566
-1
0
-1.5
-0.5
2.25
0.25
0.105399225
0.778800783
0.884200008
0
-1
-0.5
-1.5
0.25
2.25
0.778800783
0.105399225
0.884200008

SUM(CLASS1)
SUM(CLASS2)
Y1=SUM(CLASS1)/Count-1
Y2= SUM(CLASS2)/Count-2
4.672804698
1.768400015
1.557601566
0.884200008

As Y1>Y2 the data point [0.5, 0.5] lies in class 1.