## Overview

Machine Learning is an immensely broad field. Not only are there questions of considerable theoretical importance, but it is also applied to a vast number of practical problems such as personal identification (there is a whole array of biometric identification techniques), video surveillance, gesture recognition, speech processing (there is much more to it than just speaker identification), data mining, and so on. Let us briefly talk about a familiar problem that illustrates some of the issues that need to be addressed. Let us think about developing a dynamic signature verification system. This means that the signature is recorded with a digitizing tablet, i.e. the signature is given as a parametric curve. We are all familiar with the fact that our signatures differ every time we sign them. Apart from the different shapes, different signatures can be of different sizes, different orientations (rotations), positions, etc. Then we may also want to compare not only the shape of the signature but also pressure, pen tilt and directions, etc. A direct comparison of signatures is clearly not feasible and one might consider comparing models of the signature. This model has to take into account that there are variations between genuine signatures. Moreover, the variations themselves may have structure that one can exploit. One is therefore naturally led to start thinking about the possibility of building a probabilistic model of the signature.

Essentially what one does is to represent a signture with a probability density function. Apart from figuring out how that can be done, it most likely involves a training process where one uses sample signatures in order to build the representative density function. Then we are presented with a new signature, either for verification or identification purposes. In both cases we need to compare the signature with the probabilistic model. This typically returns a confidence value. In the verification case we need to decide whether the confidence value is high enough to accept the signature. Thus we need to develop some kind of decision theory. Then we might realize that there can be an additional penalty for wrong decisions, a forgery can be accepted, or a genuine signature declined. Commercial banks do not want to alienate their customers by rejecting their signatures, they would rather accept a few more forgeries. This penalty should be built into our decision theory. The identification problem is basically a classification problem where a given signature needs to be assigned to a specific class, namely the correct signatory, or of course, the class of unknown signatures. These are just some of the issues that we hope to address as part of this module.

## Homework

Homework | Due Date | Selected Solutions |
---|---|---|

Homework 1 | 25 February 2011 | Solutions 1 (amended) |

Homework 2 | 4 March 2011 | Solutions 2 |

Homework 3 | 11 March 2011 | Solutions 3 |

Homework 4 | 29 April 2011 | Solutions 4 |

## Tests

Test 1 | Solutions |

Test 2 | Solutions |

## Projects

Project | Additional material | Date due |
---|---|---|

Project 1 | 6 May 2011 | |

Project 2 | Signature Samples 1 Signature Samples 2 | 27 May 2011 |

## Python

Stefan's presentations: a brief introduction to python and a quick peek at SciPy.

*Python(x,y)*is a one-click installer with Python, NumPy, SciPy, Matplotlib and Mayavi. You can grab it here.

### Documentation

- NumPy/SciPy documentation
- Titus Brown's Advanced Software Carpentry
- List of tutorials and documentation
- Python for Matlab users
- Numpy for Matlab users

### Netiquette

How to ask questions (on a mailing list) so you're likely to get satisfactory answers. The document is perhaps a little terse, but still offers good advice.