As we were
very much interested in our project, we tried to use the available data in as
many approaches as the time allows us.
From the
very beginning we noticed that it will be difficult to classify the entire
images (1000x1000) pixels using neural network. We studied the functionally
available in the netlab software[1], Neuro solutions[2],
and Matlab’s Neural Network toolbox and since we were more familiar with
Matlab, we decided to use Matlab on a 256x256 area, where we have training
data.
The newff
function creates a feed-forward back propagation network. Feed-forward networks
consist of:
1. Nl
layers using the dot product weight function.
2. A net input function that calculates a
layer's net input by combining its weighted inputs and biases
3. A
specified transfer functions.
The first
layer has weights coming from the input and each subsequent layer has a weight
coming from the previous layer. And the last layer is the network output.
The general
form of the function is:
newff
(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF)
where PR is
an R x 2 is a matrix of min and max values for R input elements, thus having
the images as vectors; we will have a 6x2 matrix that contains the minimum and
the maximum value of each image in a row.
Si represents the size of the of ith layer,
i.e. when we have two input layers and a layer for the output, the S vector
will look like [2 1].
For each
layer we have to define a Transfer function. Transfer functions calculate a
layer's output from its net input, the default TFi is 'tansig', which
calculates its output according to:
n =
2/(1+exp(-2*n))-1
Log sigmoid
transfer function. logsig(N) takes one input, N -- S x Q matrix of net input
(column) vectors and returns each element of N squashed between 0 and 1
logsig(n) =
1 / (1 + exp(-n)) .
BTF – Back
propagation network training function, default = 'traingdx'
BLF – Back
propagation weight/bias learning function, default = 'learngdm'
PF is the
Performance function, and the default is 'mse' which measures the network's
performance according to the mean of squared errors.
trainlm is
a network training function that updates weight and bias values according to
Levenberg-Marquardt optimization.
The measure
of accuracy
As we have
only one class in the training data, we are assigning the foreground to “ones”
and the background to “zeros”. Therefore we assume a correct output when the
value is larger than 0.99 or smaller than 0.1 when the training data is equal
1.0 and 0.0 respectively, i.e. :
if
(target==1 & output>0.99)
correct=correct+1;
elseif
(target==0 & output<0.01)
correct=correct+1;
end
We compute
the net accuracy in three nets and select the heights accuracy as the best net
to simulate the input data with.
We use half
of both the training and the input data (every second value) as the training
parameters, and use the other half for testing. Typically one epoch of training
is defined as a single presentation of all input vectors to the network. The
network is then updated according to the results of all those presentations.
Training occurs until a maximum number of epochs occur, the performance goal is
met, or any other stopping condition.
The figures
below shows training step, and shows how the mean of the squared errors decreases
significantly in the first few epochs.
Figure 100, the training step.
The
classified image shows evidently the difference between the cleared areas and
the unchanged area, we can even distinguish at least another group of fields,
namely the light blue area in the classified image.
Comparing
the false color composite and the training data, and because of the
significantly higher values of the cleared area and the big difference with the
background, it is easy to select a threshold value, that was in this case 0.18.
It is
important to mention that we notice that there are relatively large differences
between the training data and the classified image. We notice in some cases
that shadows break one field down to two or more parts, see the ellipses, in
other cases the false color composite image shows clearly disconnected fields,
while they appears connected in the training data.
|
|
||
|
|
||
a |
b |
|
|
c |
d |
|
|
Figure101,
The inputs and the results for data from year 1993, (a) The false color
composite of the training data area, (b) the training data, (c) the classified
image obtained at a net accuracy of 0.83102, (d) the thresholded image, t=0.17.
The
classification of the second year data has been done the same way. We notice
that the polygons are generally thicker in the training data than the false
color composite, and that is also the case in the classified image.
|
|
||
|
|
||
a |
b |
|
|
c |
d |
|
|
Figure 102,
The inputs and the results for data from year 1998, (a) The false color
composite of the training data area, (b) the training data, (c) the classified
image obtained at a net accuracy of 0.85934, (d) the thresholded image, t=0.20.
We expect that this will significantly affect
the accuracy assessment.
After
obtaining the thresholded images we label them differently, to be able to
distinguish the classes when we combine them in one image, for instance we
assign the cleared area in 1993 to 10, and the cleared areas of 1998 to
20. Adding the images we will probably
get overlapped areas between the two years, which will represent no-change
areas. Because the polygons represent cleared areas, we are naming the polygons
from 1993 as covered areas when we are plotting them with year 1998.
Figure 103, the combined training data 93-98
We do the
same for the thresholded-classified images.
By only
visually comparison of the two images, we can see that they differ a lot, even
when the shapes of the polygons are preserved generally.
Figure 104,
the combined classified image that represent changes happened during 1993-1998.
However, we
computed the confusion matrix, and as we were expecting, these are the results:
The next
table shows the confusion matrix and the kappa value.
|
Neural Network 93-98 |
|
||
U to C |
U to C |
C to U |
NC to B |
Sum of rows |
C to U |
536 |
0 |
55 |
591 |
NC |
5 |
726 |
42 |
773 |
|
528 |
333 |
27 |
888 |
|
|
|
|
1289 |
Cls sum |
1069 |
1059 |
124 |
|
Producer
Accuracy |
0,501403181 |
0,685552 |
0,217742 |
|
User
Accuracy |
0,906937394 |
0,939198 |
0,030405 |
|
Over
all Accuracy |
0,572380107 |
|
|
|
Kappa |
0,382320623 |
|
|
|
Where:
U to C: uncovered to covered
C to U: cover to uncovered
NC: unchanged
And 157
pixels were classified as no change. The NC to B represents the no change areas
that are classified as background.
So, from a
total of 2409 pixels only 1289 pixels are classified correctly, according to
the training data that we think it is not in that quality that can be used to
statistical analysis.
[1] Netlab Algorithms for
Pattern Recognition, Ian T Nabney, Springe, 2001. http://www.ncrg.aston.ac.uk/netlab/
[2] In a question abut the efficiency of neural network to tasks like ours, Gary W. Lynn from Nero Dimensions wrote to us ‘…since I am more familiar with the technical aspects of the software. You are right that neural networks do not handle large images very well. I would recommend that you try to do some pre-processing on the images to extract out only the most relevant features or sub-images. http://www.nd.com/