2010-03-19
Other stuff
1 \chapter{Methodology}
2 To achieve automated iris recognition, there are three main tasks: first we must locate the iris in a given image. Secondly, it is necessary to encode the iris information into a format which is amenable to calculation and computation, for example a binary string. Finally, the data must be storable, to load and compare these encodings.
4 \section{Iris Location}
5 When locating the iris there are two potential options. The software could
6 require the user to select points on the image, which is both reliable and
7 fairly accurate, however it is also time consuming and implausible for any
8 real-world application. The other option is for the software to auto-detect the
9 iris within the image. This process is computationally complex and introduces a source of error due to the inherent complexities of computer vision. However, as the software will then require less user interaction it is a major step towards producing a system which is suitable for real-world deployment, and thus became a priority for extending the program specification.
11 Locating the iris is not a trivial task since its intensity is close to that of
12 the sclera\footnote{The white part of the eye.} and is often obscured by
13 eyelashes and eyelids. However the pupil, due to its regular size and uniform
14 dark shade, is relatively easy to locate. The pupil and iris can be approximated as concentric and this provides a reliable entry point for autodetection.
16 \subsection{Pupil}
18 The pupil's intensity and location are fairly consistent in most images and
19 so it lends itself well to auto-detection. Detection of the pupil can be
20 carried out by: removing noise by applying a median blur, thresholding the image
21 to obtain the pupil, performing edge detection to obtain the pupil boundary and
22 then identifying circles.
24 A median filter is an kernel based, convolution filter which blurs an image by setting a pixel value to the median of itself with its neighbours. A na\"ive approach to applying a median filter would be to simply find the median for a given area around each
25 pixel by sorting an array and finding the median to replace the current pixel
26 with. This could be implemented with a sorting algorithm which would have runtime
27 $\mathcal{O}(n \log n)$ on average. However, since the software requires
28 several median blurs on different images we require a more efficient solution.
30 For an improved algorithm we consult Perreault's paper \cite{median2007}, which
31 describes an algorithm to create a median filter in linear time. The process
32 (see algorithm \ref{medianfilteralgor}) involves constructing individual column
33 histograms and combining them to form histograms centred around a pixel, known
34 as a kernel histogram.
36 The significant speed increase comes from the way in which the column
37 histograms are updated and combined. For each pixel we remove an old column
38 histogram from the kernel, shift a new column histogram down one pixel so it is
39 centred on the required row and then add this new histogram to the kernel
40 histogram. While this radically reduces the number of operations which need to
41 be performed for each pixel, there is an initialisation step for each row which
42 has runtime linear in the size of the kernel histogram $ \mathcal{O}(r)$. This
43 enabled the entire median filter to be applied in a matter of milliseconds.
45 \begin{algorithm}
46 \caption{Median filtering algorithm as proposed in \cite{median2007}}
47 \label{medianfilteralgor}
48 \begin{algorithmic}
49 \STATE \textbf{Input:} Image X of size $m \times n$, kernel radius $r$.
50 \STATE \textbf{Output:} Image Y of size $m \times n$.
51 \STATE Initalize each column histogram $h_0, \ldots, h_{n-1}$ as if centred on row $-1$.
52 \FOR{$i=1$ to $m$}
53 \STATE Shift the first $r$ column histograms $h_0, \ldots, h_{r-1}$ down 1 pixel.
54 \STATE Combine these $r$ column histograms to form the kernel histogram $H$.
55 \FOR{$j=1$ to $n$}
56 \STATE Set pixel $Y_{i,j}$ equal to the median of $H$.
57 \STATE Shift column histogram $h_{j+r}$ down 1 pixel.
58 \STATE Remove column histogram $h_{j-r-1}$.
59 \STATE Add column histogram $h_{j+r}$.
60 \ENDFOR
61 \ENDFOR
62 \end{algorithmic}
63 \end{algorithm}
65 \begin{figure}
66 \centering
67 \includegraphics[width=0.35\textwidth]{pupil_step_1_start}
68 \label{pupil_step_1_start}
69 \hspace{1cm}
70 \includegraphics[width=0.35\textwidth]{pupil_step_2_median}
71 \caption{Image before and after a median filter.}
72 \label{pupil_step_2_median}
73 \end{figure}
75 The overall effect of the median blur is to reduce the noise and pixel
76 intensity complexity of the iris image without perturbing the edge fidelity of
77 the original image. This results in a stronger clustering of pixel values in
78 the resultant pixel data histogram; this permits a largely noise-free, dynamic
79 analysis of features which occupy discrete pixel ranges, such as the pupil (see
80 figure \ref{example_histogram}).
82 \begin{figure}
83 \centering
84 \includegraphics[width=0.35\textwidth]{example_histogram}
85 \caption{Histogram generated in the application following a median blur of a database image. Note the distinct, left-most (darkest pixel values) peak as a typical pupil's pixel signature on the image.}
86 \label{example_histogram}
87 \end{figure}
89 Upon application of a dynamic threshold to our median blurred image the
90 leftmost image in figure \ref{pupil_step_3_thresh} is obtained. The location of
91 the pupil is well defined, and edge detection can extract this information to
92 form a single bit depth image which can be searched for circles.
94 The Sobel filter is an edge detection technique which calculates the gradients
95 (in both the $x$ and $y$ direction) of the image intensity for each pixel and
96 then combines them to give a gradient magnitude. This indicates how sharp the
97 boundary is, which is itself an indication of whether an edge is present.
99 \begin{figure}
100 \centering
101 \includegraphics[width=0.35\textwidth]{pupil_step_3_thresh}
102 \label{pupil_step_3_thresh}
103 \hspace{1cm}
104 \includegraphics[width=0.35\textwidth]{pupil_step_4_sobel}
105 \caption{Thresholded image before and after edge detection with a 2D Sobel convolution filter.}
106 \label{pupil_step_4_sobel}
107 \end{figure}
109 As can be seen in \ref{pupil_step_4_sobel} the result of this process of
110 filtering is a circular artifact denoting the extremeties of the pupil in the
111 image. This data requires further analysis for the presence of circular
112 artifacts to acquire a "best-fit" representation of the potentially elliptical
113 pupil.
115 %TODO: make this not logbook styleeee
116 %TODO: make this focused towards pupils
117 The Hough Transform is implemented to locate the pupil (and subsequently the
118 iris). Given a regular curve -- in this case a circle -- and a suitable range
119 of parameters -- $x$ centre, $y$ centre and radius -- the generalised Hough
120 transform can query individual pixels in an image for their contribution to a
121 globally consistent solution for any parameters within the valid range.
123 Due to the high computational complexity of the Hough transform technique (the
124 parameter space for a simple circle is in three dimensions and depending on
125 valid ranges this can quickly grow to millions of computations) and previous
126 considerations for application performance, a number of elements were
127 streamlined to accelerate the computation.
129 Thresholding the input image has the inherent benefit of changing the virtual
130 bit-depth of the image to 1 (black or white). Furthermore, the implemented algorithm saves
131 time by only operating on black pixels -- a greyscale implementation would
132 significantly add to the processing time. Additionally a search on the
133 threshold result (figure \ref{pupil_step_3_thresh}) gives confident parameter
134 estimations for centre and radius which can be used to constrain and guide the
135 Hough algorithm.
137 \begin{figure}
138 \centering
139 \includegraphics[width=0.35\textwidth]{pupil_step_5_hough}
140 % \label{pupil_step_5_hough}
141 \hspace{1cm}
142 \includegraphics[width=0.35\textwidth]{pupil_step_6_finish}
143 \caption{Hough transform solution for a circle (red), given edge detected pupil image.}
144 \label{pupil_step_6_finish}
145 \end{figure}
148 As can be seen in figure \ref{pupil_step_6_finish}, this process gives fast,
149 accurate and reliable solutions for circular artefacts in the thresholded and
150 edge detected pupil image. The basis of iris auto-detection relies on this
151 process.
153 \subsection{Iris}
155 Once we have the location of the pupil clearly defined, the complexity of
156 locating the iris is somewhat reduced due to the relative concentricity of the
157 pupil and iris.
159 In contrast to pupil detection, efficiently locating the iris is somewhat more
160 complicated due to (i) the obstruction of the iris by the eyelids for most
161 eyes, (ii) its irregular pattern and (iii) because of the relative similarity with the iris
162 boundary and the sclera.
164 Nevertheless, following a similar method to that of locating the pupil generally provides good results.
165 Again, a median filter is applied to the original image in order to remove noise from the image and to
166 strengthen the clustering in the pixel histogram. The resultant histogram is used to identify a point
167 that can be used to threshold the image to produce an image similar to figure \ref{iris-threshold}.
169 The method involves finding the peaks of the histogram (excluding the left-most peak that represents the pupil)
170 and choosing a treshold value between these points. Although not always optimum, this method generally creates
171 an image that the Hough Transform can use to find an approximate match for the iris; other, potentially better,
172 methods for achieving this threshold are discussed in the future work section (\ref{furtherwork}).
174 \begin{figure}
175 \centering
176 \includegraphics[width=0.4\textwidth]{iris-threshold}
177 \caption{ An example of an image that is obtained by intelligently thresholding the original input image and then using Sobel to edge detect.
178 This image represents the data that is used to locate the iris; the data is used by the Hough Transform to decide
179 on a best-fit circle to represent the location of the iris. A rough semi-circle can be made out by the human eye; this is what
180 allows the Hough Transform to accurately locate the iris. }
181 \label{iris-threshold}
182 \end{figure}
184 The Hough transform is very efficient for the task of finding the iris from an image such
185 as this because it is resilient to noise
186 and performs well even when a large amount of the circle is hidden. For
187 example, when two eyelids are covering a large portion of the iris, such
188 as in figure \ref{hough-eyelids}. This provides us with a circle that defines the iris, although
189 portions of this are still obscured by the eyelids, meaning that these also need to be defined.
192 \begin{figure}
193 \centering
194 \includegraphics[width=0.5\textwidth]{hough-eyelids}
195 \caption{ A demonstration of the results obtained by applying the Hough Transform to an image such as that in figure
196 \ref{iris-threshold}. The chosen circle is shown in red, which is an average of all of the circles with the
197 maximum number of votes (in green). The blue circles represent those with the second most votes.}
198 \label{hough-eyelids}
199 \end{figure}
202 \subsection{Eyelids}
203 We note that an image of an eye roughly has three intensities, from darkest to
204 lightest: pupil, iris and sclera, and eyelids. Hence a suitable thresholding
205 method should be able to separate the eyelids from the rest of the image. Otsu
206 gives a method in \cite{otsu1975} for appropriately thresholding grey level
207 images; the algorithm works as follows.
209 Assume we have an image with $L$ possible intensities for each pixel and $n_i$
210 is the number of pixels of intensity $i$. We first normalize the histogram into
211 a probability distribution, that is $p_i = n_i / \sum_{i=1}^{L} n_i$. We assume
212 that we threshold the image at level $k$ into two classes, $C_0$ containing
213 levels less than or equal to $k$ and $C_1$ containing the rest. We then define
214 $$\omega(k) := \sum_{i=1}^{k} p_i \text{ and } \mu(k) := \sum_{i=1}^{k} ip_i,$$
215 which are the probability of a pixel picked at random being in $C_0$ and the expected (mean) intensity for $C_0$ respectively.
217 It is clear to see that that $1-\omega(k)$ is the probability of a pixel being in $C_1$ and $\mu(L)$ is the mean intensity of the entire image. We then calculate
218 $$\sigma^2 = \frac{\left[\mu(L)\omega(k) - \mu(k)\right]^2}{\omega(k)[1 - \omega(k)},$$
219 which is shown to be measure of the variance between the two classes $C_0$ and $C_1$ by Otsu in \cite{otsu1975}. We calculate this between-class variance for each $k$ and finally threshold on the $k$ where $\sigma^2$ maximized.
221 To get any useful information from Otsu thresholding we must first ignore the
222 pupil to ensure that the result does not only consist of two classes (the pupil
223 and then everything else).
224 Once we have ignored the pupil in class calculations we find that Otsu thresholding
225 gives us two classes: one containing the iris, sclera and the pupil (since it
226 has lowest intensity), and the other class containing the eyelids, see figure
227 \ref{otsu-eyelids}.
228 With the iris located within the image we must now begin the task of extracting information from it.
230 \begin{figure}
231 \centering
232 \includegraphics[width=0.35\textwidth]{anothereye}
233 \hspace{1cm}
234 \includegraphics[width=0.35\textwidth]{otsue}
235 \caption{Image before and after a median blur and Otsu threshold.}
236 \label{otsu-eyelids}
237 \end{figure}
240 \section{Encoding the Iris}
241 %TODO: Reasons for unwrapping the iris for the filters. Polar coords.
243 The iris is encoded to a unique set of 2048 bits which serve as the fundamental
244 identification of that person's particular iris. These iris bit codes can be
245 stored in a database and then compared to uniquely identify a person. The size
246 of 2048 is sufficiently large to store the data of several particular filters
247 on most angles of the iris, while also being sufficiently small to be easily
248 stored in a database and manipulated quickly. We wish to extract phase
249 information from the iris as opposed to amplitude information since phase
250 information is not skewed by pupil deformation. We use Gabor filters to extract
251 this phase information as suggested by Daugman \cite{daugman2004}.
253 %One of the main problems with this kind of approach is the way in which we
254 %target the iris boundary in order for it to be
256 The Gabor filter is an application of the Gabor wavelet (see equation
257 \ref{gaborEquation}) \cite{daugman2004}. This will return two bits depending on
258 which quadrant (see figure \ref{quadrants}) the normalised resulting imaginary
259 number from this lies in. This equation can be simplified for computation by
260 considering it as a combination of two filters: one filter representing the
261 real part of the integral and the other representing the imaginary part. Each
262 of these filters consist of a combination of a Gaussian and a sinusoidal
263 filter (see figure \ref{gabor-combo}). The parameters $\alpha$, $\beta$ and
264 $\omega$ are then tuned by constraining the Gaussian filter to range over one
265 standard deviation and the sinusoidal filter to range from $-\pi$ to $\pi$.
267 \begin{equation}
268 \label{gaborEquation}
269 h_{\{Re,Im\}} = sgn_{\{Re,Im\}}\int _{\rho} \int _{\phi} I(\rho, \phi) e^{-i\omega(\theta _{0} - \phi)} e^{-(r_{0}-\rho)^{2}/\alpha^{2}} e^{-(\theta_{0}-\phi)^{2}/\beta^{2}} \rho d\rho d\phi
270 \end{equation}
272 \begin{figure}
273 \centering
274 \includegraphics[width=0.35\textwidth]{quadrants}
275 \caption{The four phase quadrants and the corresponding bit sequences they represent}
276 \label{quadrants}
277 \end{figure}
279 \begin{figure}
280 \centering
281 \includegraphics[width=0.8\textwidth]{gabor-combo}
282 \caption{The real (left) and imaginary (right) parts of the Gabor wavelets}
283 \label{gabor-combo}
284 \end{figure}
286 %\subsection{Heatmaps}
287 %A heatmap is created by applying a fixed size Gabor filter to every pixel in an
288 %unwrapped iris. Then the 2 bit code for each pixel is represented as a colour
289 %on the heatmap. By doing this we were able to see clearly which size filters
290 %would give us the most informative data when doing comparisons. As shown by
291 %figure \ref{heatmaps}, smaller filters produce a largely varying heatmap, and
292 %much larger filters give significantly less useful information for comparisons.
293 %We will therefore try to use small filters across the area closest to the pupil
294 %to capture the large amount of detail seen here. We will then increase the
295 %sizes of filters further out in the iris to make sure that we cover a
296 %sufficient proportion of the iris in our comparisons, but will try to balance
297 %this so that we don't use uninformative filters as in figure \ref{heatmaps}
299 \subsection{Filter Placement and Size}
300 The Gabor filters need to be placed so that they take account for a large
301 enough range of data without missing out the fine detail within the iris. As we
302 require 2 bits to uniquely identify each quadrature, we have the additional
303 constraint that 1024 filters must be used, allowing a manageable bit code
304 length of 256B for each iris which can be split into sections of 4B. To
305 simplify the calculations involved in applying the filters to the unwrapped iris they
306 can also be considered to all be placed at a uniform rotation of 0. Since it
307 can be observed that there is consistent amounts of information at points
308 varying in the angular direction in the iris, it is also a good idea to place
309 the filters uniformly in this direction. In this implementation 256 angular
310 directions were considered because it divides our
311 required number of filters evenly and includes a large spread of data across
312 the iris. However, in many cases, in the radial direction there is more
313 distinct information observed closer to the pupil.
315 To include the more important data close to the pupil the filters can be
316 positioned so that a larger proportion of them have their centres in this
317 range. With the 256 angular directions there are 4 radial filter locations left
318 to position. These can be placed uniformly between the bottom of the iris and
319 half-way across the iris to allow the data around the pupil to be explored.
320 Finally, to make sure that filters are always large enough to represent enough
321 data, the constraint that a filter centre cannot be placed on the 3 pixels
322 nearest the top and bottom of the image is added. This means that useless
323 filters of sizes 1 and 3 will never be included when encoding the iris, and we
324 avoid including additional pupil debris that may have been incorporated into
325 the iris section. The final set of filter locations across an unwrapped iris is
326 represented in figure \ref{filterpos}.
328 \begin{figure}[h!]
329 \centering
330 \includegraphics[width=0.8\textwidth]{filterpos}
331 \caption{Filter positions for an iris of radius 40 between $\theta$=0 and $\theta$=45 }
332 \label{filterpos}
333 \end{figure}
335 The final consideration with the filters is their size. Two schemes for doing
336 this are demonstrated in figure \ref{filter-size}. The first (left) aims to
337 position the largest possible filter at every filter location. This means that
338 the filters fulfil one of the original requirements; that the filters should
339 capture as much information as they can. The issue which this scheme suffers
340 from is that the filters with their centres in the middle of the iris can be so
341 large that they aren't really able to represent some of the finer detail in the
342 image. The second scheme (right) tackles this problem
343 by capping all filters in the image at a maximum size. This maximum size can be
344 tuned in the implementation to get more distinct bitcodes, which is why it is
345 favourable and consequently the one we use.
346 \begin{figure}
347 \centering
348 \includegraphics[width=0.9\textwidth]{filter-size}
349 \caption{Two schemes for filter sizes and placements at a fixed point in the angular direction in an iris of radius 40. Left: maximum size filter placed. Right: filter sizes capped at 25}
350 \label{filter-size}
351 \end{figure}
353 \section{Comparisons}
354 Iris encodings can be compared by computing a Hamming distance between them.
355 The Hamming distance between two iris codes is calculated as given in \cite{daugman2004}.
356 Assume we have two iris codes, $codeA$ and $codeB$, with corresponding bit masks,
357 $maskA$ and $maskB$ (where a 0 in the bit mask corresponds to a bad bit in that position
358 in the iris code). Then we calculate the hamming distance $H$ as:
360 $$H = \frac{\|(codeA \otimes codeB) \cap maskA \cap maskB\|}{\|maskA \cap maskB\|}. $$
362 Where $\otimes$ is the bitwise XOR operator and $\cap$ is the bitwise AND operator.
364 We can see that the numerator will be the number of differences between the
365 mutually non-bad bits of $codeA$ and $codeB$ and that the denominator will be
366 the number of mutually non-bad bits. However an astute reader may observe that
367 this measure can fail if there is an exceedingly low number of mutually non-bad bits
368 causing the numerator to be zero even though there may actually be large differences
369 between $codeA$ and $codeB$. However, this is a simple case to handle and will rarely
370 occur when comparing iris images.
372 This comparison is performed for various shifts in the codes to check for
373 rotation of the image, as it is very possible that a subject's eye in the image
374 is rotated if the camera or their head is tilted slightly.
376 This statistical independence test is the key to iris recognition. It provides
377 sufficiently many degrees-of-freedom to fail in every case where the iris
378 bitcodes of two different eyes are compared, but uniquely passed when the same
379 iris is compared with itself.
381 \subsection{Iris Database}
382 Since we are able to generate the code and the iris mask directly, these are
383 the only two items that are required to be stored in a database, the image
384 bitmaps themselves are no longer required.
386 We are currently using a basic text file as our database as this was simple to
387 implement and for the small number of eyes we have as test subjects this is
388 perfectly adequate and proves to be able to compare iris bitcodes extremely
389 fast.
391 \section{Graphical User Interface}
392 Since the software was built in Qt, a cross-platform toolkit, the final product
393 runs on Linux, Windows, and Mac desktops, adopting the operating system's
394 native toolkit and giving it an integrated feel on all platforms.
396 \begin{figure}
397 \centering
398 \includegraphics[width=0.8\textwidth]{kde}
399 \caption{Linux with Open File view.}
400 \end{figure}
401 \begin{figure}
402 \centering
403 \includegraphics[width=0.45\textwidth]{windows}
404 \includegraphics[width=0.45\textwidth]{mac}
405 \caption{Windows and Mac OS X.}
406 \end{figure}
409 Usability is an important consideration when creating an application, and
410 throughout development the aim was to provide a clean and intuitive interface
411 whilst still providing full usability and advanced features. The key
412 intellectual challenges for the user interface have been primarily to provide a
413 straight-forward and informative access point for the user, but also to provide
414 relevant extra data when required with a fall-back mechanism for difficulties
415 in auto-detection.
417 \begin{figure}
418 \centering
419 \includegraphics[width=0.5\textwidth]{heatmapapp}
420 \caption{Application with heatmap tab.}
421 \end{figure}
423 The final product includes relevant data and feedback behaviour such as
424 histogram information from the image to allow the advanced user to track-down
425 and diagnose auto-detection failures. This information is placed into a tabbed
426 interface below the primary image so that it is easily accessible without
427 encroaching on our clean interface which allows the average user
428 point-and-click refinement (if necessary) of the iris parameters with real-time
429 visual feedback without overloading them with this more advanced information.