5.3.1 Classification using Artificial Neural Network

 

As we were very much interested in our project, we tried to use the available data in as many approaches as the time allows us.

From the very beginning we noticed that it will be difficult to classify the entire images (1000x1000) pixels using neural network. We studied the functionally available in the netlab software[1],  Neuro solutions[2], and Matlab’s Neural Network toolbox and since we were more familiar with Matlab, we decided to use Matlab on a 256x256 area, where we have training data.

 

The newff function creates a feed-forward back propagation network. Feed-forward networks consist of:

1. Nl layers using the dot product weight function.

2.  A net input function that calculates a layer's net input by combining its weighted inputs and biases

3. A specified transfer functions.

The first layer has weights coming from the input and each subsequent layer has a weight coming from the previous layer. And the last layer is the network output.

 

The general form of the function is:

 

newff (PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF)

 

where PR is an R x 2 is a matrix of min and max values for R input elements, thus having the images as vectors; we will have a 6x2 matrix that contains the minimum and the maximum value of each image in a row.

 

 Si represents the size of the of ith layer, i.e. when we have two input layers and a layer for the output, the S vector will look like [2 1].

For each layer we have to define a Transfer function. Transfer functions calculate a layer's output from its net input, the default TFi is 'tansig', which calculates its output according to:

n = 2/(1+exp(-2*n))-1

 

Log sigmoid transfer function. logsig(N) takes one input, N -- S x Q matrix of net input (column) vectors and returns each element of N squashed between 0 and 1

logsig(n) = 1 / (1 + exp(-n)) .

 

BTF – Back propagation network training function, default = 'traingdx'

BLF – Back propagation weight/bias learning function, default = 'learngdm'

 

PF is the Performance function, and the default is 'mse' which measures the network's performance according to the mean of squared errors.

 

trainlm is a network training function that updates weight and bias values according to Levenberg-Marquardt optimization.

 

The measure of accuracy

 

As we have only one class in the training data, we are assigning the foreground to “ones” and the background to “zeros”. Therefore we assume a correct output when the value is larger than 0.99 or smaller than 0.1 when the training data is equal 1.0 and 0.0 respectively, i.e. :

 

if (target==1 & output>0.99)

correct=correct+1;

elseif (target==0 & output<0.01)

correct=correct+1;

end

 

We compute the net accuracy in three nets and select the heights accuracy as the best net to simulate the input data with. 

We use half of both the training and the input data (every second value) as the training parameters, and use the other half for testing. Typically one epoch of training is defined as a single presentation of all input vectors to the network. The network is then updated according to the results of all those presentations. Training occurs until a maximum number of epochs occur, the performance goal is met, or any other stopping condition.

The figures below shows training step, and shows how the mean of the squared errors decreases significantly in the first few epochs.

 

Figure 100, the training step.

 

The classified image shows evidently the difference between the cleared areas and the unchanged area, we can even distinguish at least another group of fields, namely the light blue area in the classified image.

Comparing the false color composite and the training data, and because of the significantly higher values of the cleared area and the big difference with the background, it is easy to select a threshold value, that was in this case 0.18.

 

It is important to mention that we notice that there are relatively large differences between the training data and the classified image. We notice in some cases that shadows break one field down to two or more parts, see the ellipses, in other cases the false color composite image shows clearly disconnected fields, while they appears connected in the training data.

 

a

b

 

c

d

 

Figure101, The inputs and the results for data from year 1993, (a) The false color composite of the training data area, (b) the training data, (c) the classified image obtained at a net accuracy of 0.83102, (d) the thresholded image, t=0.17.

 

The classification of the second year data has been done the same way. We notice that the polygons are generally thicker in the training data than the false color composite, and that is also the case in the classified image.

 

a

b

 

c

d

 

Figure 102, The inputs and the results for data from year 1998, (a) The false color composite of the training data area, (b) the training data, (c) the classified image obtained at a net accuracy of 0.85934, (d) the thresholded image, t=0.20.

 

 We expect that this will significantly affect the accuracy assessment. 

 

After obtaining the thresholded images we label them differently, to be able to distinguish the classes when we combine them in one image, for instance we assign the cleared area in 1993 to 10, and the cleared areas of 1998 to 20.  Adding the images we will probably get overlapped areas between the two years, which will represent no-change areas. Because the polygons represent cleared areas, we are naming the polygons from 1993 as covered areas when we are plotting them with year 1998.

Figure 103, the combined training data 93-98

 

We do the same for the thresholded-classified images.

By only visually comparison of the two images, we can see that they differ a lot, even when the shapes of the polygons are preserved generally.

 

Figure 104, the combined classified image that represent changes happened during 1993-1998.

 

However, we computed the confusion matrix, and as we were expecting, these are the results:

The next table shows the confusion matrix and the kappa value.

 

 

Neural Network 93-98

 

U to C

U to C

C to U

NC to B

Sum of rows

C to U

536

0

55

591

NC

5

726

42

773

 

528

333

27

888

 

 

 

 

1289

Cls sum

1069

1059

124

 

Producer Accuracy

0,501403181

0,685552

0,217742

 

User Accuracy

0,906937394

0,939198

0,030405

 

Over all Accuracy

0,572380107

 

 

 

Kappa

0,382320623

 

 

 

 

Where:

U to C: uncovered to covered 

C to U: cover to uncovered

NC:     unchanged                   

And 157 pixels were classified as no change. The NC to B represents the no change areas that are classified as background.

So, from a total of 2409 pixels only 1289 pixels are classified correctly, according to the training data that we think it is not in that quality that can be used to statistical analysis.

 

 

 



[1] Netlab Algorithms for Pattern Recognition, Ian T Nabney, Springe, 2001. http://www.ncrg.aston.ac.uk/netlab/

 

[2] In a question abut the efficiency of neural network to tasks like ours, Gary W. Lynn from Nero Dimensions wrote to us ‘…since I am more familiar with the technical aspects of the software. You are right that neural networks do not handle large images very well. I would recommend that you try to do some pre-processing on the images to extract out only the most relevant features or sub-images. http://www.nd.com/