When I needed to plot classifier decision boundaries for my thesis, I decided to do it as simply as possible. Although the decision boundaries between classes can be derived analytically, plotting them for more than two classes gets a bit complicated. You have to find the intersection of regions that are all assigned to a particular class and then find the expression for the boundary of that class. If analytical boundaries are not necessary, then a brute force, computational approach can be used. This tutorial does such an approach: the feature space is divided up into a grid and then each grid cell is classified. The classified map is then shown as an image behind a scatter plot of the training data. This is an application of how to plot over an image background in MATLAB. The result will look something like this (for a city block classifier):

First, create some random training data for 3 classes. Here, I will use some random Gaussian-distributed points with a different mean for each class. After generating the samples, also compute the class means to use as the class prototype for the classifier distance measure. In this tutorial, city block distance from the class mean will be used as the distance measure, so only the class mean needs to be computed.

% number of training samples to generate. nsamples = 20; % create some training data for three classes. training = cell(3,1); training{1} = randn(nsamples,2) + repmat([-2 -2], [nsamples 1]); training{2} = randn(nsamples,2) + repmat([2 3], [nsamples 1]); training{3} = randn(nsamples,2) + repmat([-3 2], [nsamples 1]); % sample mean sample_means = cell(length(training),1); % compute sample mean to use as the class prototype. for i=1:length(training), sample_means{i} = mean(training{i}); end

The technique that will be used to plot the decision boundaries is to make an image, where each pixel represents a grid cell in the 2D feature space. The image defines a grid over the 2D feature space. The pixels of the image are then classified using the classifier, which will assign a class label to each grid cell. The classified image is then used as a background for a scatter plot that shows the data points of each class.

The advantage of this technique is that because it is classifying grid points in the 2D feature space, decision boundaries can be computed for any arbitrary distance measure. The disadvantage is that there is a computational cost for making very fine decision boundary maps, as you would have to make the grid finer and finer.

The code below sets this up.

% set up the domain over which you want to visualize the decision % boundary xrange = [-8 8]; yrange = [-8 8]; % step size for how finely you want to visualize the decision boundary. inc = 0.1; % generate grid coordinates. this will be the basis of the decision % boundary visualization. [x, y] = meshgrid(xrange(1):inc:xrange(2), yrange(1):inc:yrange(2)); % size of the (x, y) image, which will also be the size of the % decision boundary image that is used as the plot background. image_size = size(x); xy = [x(:) y(:)]; % make (x,y) pairs as a bunch of row vectors.

In the above code, the domain over which you want to display the decision boundaries is specified by `xrange`

and `yrange`

. The step size (`inc`

) will determine how fine the resolution of the decision boundary image is, the smaller you make this, the smoother the final result.

The `meshgrid`

function generates two images, `x`

and `y`

:

As can be seen above, the pixel values in the `x`

image increase from left to right. Each row of `x`

in fact has the value `xrange(1):inc:xrange(2)`

and each column of `y`

has the value `yrange(1):inc:yrange(2)`

. The `meshgrid`

function is giving two images that, together, define a grid in the 2D feature space. If you take corresponding pixels and write them as a coordinate pair, you get the location of a grid cell. When you re-arrange each image to a column vector and then concatenate them together (as in the last row in the code above), you form a number of row vectors, where each row represents an `(x, y)`

coordinate pair of a grid cell in the 2D feature space. Re-arranging in this fashion is not strictly necessary but will make the following code a bit easier to follow. Also, instead of issuing `x(:)`

you can use `reshape`

to do the same thing but the code will be much longer:

xy = [reshape(x, image_size(1)*image_size(2),1) reshape(y, image_size(1)*image_size(2),1)]

Now all that needs to be done is classify each coordinate pair and then re-arrange the result back into an image. That image will indicate the class label for each pixel. But because this is all derived from the `meshgrid`

result, the label for each image pixel is actually the label for that grid cell in the 2D feature space.

Classification of the `xy`

data is accomplished like so:

numxypairs = length(xy); % number of (x,y) pairs % distance measure evaluations for each (x,y) pair. dist = []; % loop through each class and calculate distance measure for each (x,y) % from the class prototype. for i=1:length(training), % calculate the city block distance between every (x,y) pair and % the sample mean of the class. % the sum is over the columns to produce a distance for each (x,y) % pair. disttemp = sum(abs(xy - repmat(sample_means{i}, [numxypairs 1])), 2); % concatenate the calculated distances. dist = [dist disttemp]; end % for each (x,y) pair, find the class that has the smallest distance. % this will be the min along the 2nd dimension. [m,idx] = min(dist, [], 2);

The distance measure (in this case city block distance) from each class mean is calculated for each grid cell by the loop. This is concatentated into an array (`dist`

). Note though that within the loop the distance for all coordinate pairs is calculated in one instruction (`disttemp`

). Languages like MATLAB allow this sort of “vectorization” and it is much faster than looping through each grid cell and calculating the value one at a time. I did not use vectorization for the first maximum likelihood classifier I had to code and it was about 100 times slower than the vectorized version.

Once the distances are calculated for each class, the final line in the code above determines the class to assign to each grid cell by finding the class that has the minimum distance. Here, the `min`

function's `idx`

output is used because the indices have the same number as the class number with the minimum distance.

The `idx`

variable ocntains the label information that is needed but it is one long column vector. The important thing to note is that because each element in `idx`

corresponds exactly to each coordinate pair in `xy`

and because each `xy`

element corresponds exactly to a pixel coordinate (or raster / image coordinate) in the `x`

and `y`

images, undoing the re-arranging when `xy`

was created will produce an image where the classifier decision is shown for each grid cell. This is how:

% reshape the idx (which contains the class label) into an image. decisionmap = reshape(idx, image_size);

Once the image is created, it can simply be shown:

figure; %show the image imagesc(xrange,yrange,decisionmap); hold on; set(gca,'ydir','normal'); % colormap for the classes: % class 1 = light red, 2 = light green, 3 = light blue cmap = [1 0.8 0.8; 0.95 1 0.95; 0.9 0.9 1] colormap(cmap); % plot the class training data. plot(training{1}(:,1),training{1}(:,2), 'r.'); plot(training{2}(:,1),training{2}(:,2), 'go'); plot(training{3}(:,1),training{3}(:,2), 'b*'); % include legend legend('Class 1', 'Class 2', 'Class 3','Location','NorthOutside', ... 'Orientation', 'horizontal'); % label the axes. xlabel('x'); ylabel('y');

This will then get you the decision boundary plot:

The three lines that show the image are an interesting example of how to plot over an image background. If you have read that tutorial, you will probably notice that it is actually one of the “wrong” examples (Example 2), in that the image is supposed to be flipped. To understand why in this case, the image is not flipped, take a look at these lines from the code above:

imagesc(xrange,yrange,decisionmap); hold on; set(gca,'ydir','normal');

According to the other post, this will make a flipped image. And yet this decision boundary is shown correctly. The reason is that when `meshgrid`

generated `x`

and `y`

, it generated them such that increasing raster row coordinates represent increasing Y coordinates of the grid. Raster row 0 is always associated with the smallest Y-coordinate shown on the axes, which is MATLAB's default way of showing images. Thus, all that's needed is to set the Y-axis direction to “normal” in order to get small Y at the bottom of the plot. This means that the decision boundary image shows correctly without all the flipping necessary in the other tutorial. In the case of the other tutorial, the image *must* be shown with raster row 0 at the top (since it is showing an image depicting a meanigful scene). In *this* tutorial, raster row 0 simply needs to be shown with the smallest Y-value on the axes.

This same technique is adaptable to 1D decision boundaries too. You just simply create a 1 pixel high image that you classify and then show this as a background in a plot of your probability density function or histogram.