reports/final/content/method.tex

2010-03-19

author
francis
date
Fri Mar 19 09:09:52 2010 +0000
changeset 395
7f671b74a3a8
parent 392
0d5b1a284478
child 400
8ac89faccb3d
permissions
-rw-r--r--

Other stuff

     1 \chapter{Methodology}
     2 To achieve automated iris recognition, there are three main tasks: first we must locate the iris in a given image. Secondly, it is necessary to encode the iris information into a format which is amenable to calculation and computation, for example a binary string. Finally, the data must be storable, to load and compare these encodings.
     4 \section{Iris Location}
     5 When locating the iris there are two potential options. The software could
     6 require the user to select points on the image, which is both reliable and
     7 fairly accurate, however it is also time consuming and implausible for any
     8 real-world application. The other option is for the software to auto-detect the
     9 iris within the image. This process is computationally complex and introduces a source of error due to the inherent complexities of computer vision. However, as the software will then require less user interaction it is a major step towards producing a system which is suitable for real-world deployment, and thus became a priority for extending the program specification.
    11 Locating the iris is not a trivial task since its intensity is close to that of
    12 the sclera\footnote{The white part of the eye.} and is often obscured by
    13 eyelashes and eyelids. However the pupil, due to its regular size and uniform
    14 dark shade, is relatively easy to locate. The pupil and iris can be approximated as concentric and this provides a reliable entry point for autodetection.
    16 \subsection{Pupil}
    18 The pupil's intensity and location are fairly consistent in most images and
    19 so it lends itself well to auto-detection. Detection of the pupil can be
    20 carried out by: removing noise by applying a median blur, thresholding the image
    21 to obtain the pupil, performing edge detection to obtain the pupil boundary and
    22 then identifying circles.
    24 A median filter is an kernel based, convolution filter which blurs an image by setting a pixel value to the median of itself with its neighbours. A na\"ive approach to applying a median filter would be to simply find the median for a given area around each
    25 pixel by sorting an array and finding the median to replace the current pixel
    26 with. This could be implemented with a sorting algorithm which would have runtime
    27 $\mathcal{O}(n \log n)$ on average. However, since the software requires
    28 several median blurs on different images we require a more efficient solution.
    30 For an improved algorithm we consult Perreault's paper \cite{median2007}, which
    31 describes an algorithm to create a median filter in linear time. The process
    32 (see algorithm \ref{medianfilteralgor}) involves constructing individual column
    33 histograms and combining them to form histograms centred around a pixel, known
    34 as a kernel histogram.
    36 The significant speed increase comes from the way in which the column
    37 histograms are updated and combined. For each pixel we remove an old column
    38 histogram from the kernel, shift a new column histogram down one pixel so it is
    39 centred on the required row and then add this new histogram to the kernel
    40 histogram. While this radically reduces the number of operations which need to
    41 be performed for each pixel, there is an initialisation step for each row which
    42 has runtime linear in the size of the kernel histogram $ \mathcal{O}(r)$. This
    43 enabled the entire median filter to be applied in a matter of milliseconds.
    45 \begin{algorithm}
    46 \caption{Median filtering algorithm as proposed in \cite{median2007}}
    47 \label{medianfilteralgor}
    48 \begin{algorithmic}
    49 \STATE \textbf{Input:} Image X of size $m \times n$, kernel radius $r$.
    50 \STATE \textbf{Output:} Image Y of size $m \times n$.
    51 \STATE Initalize each column histogram $h_0, \ldots, h_{n-1}$ as if centred on row $-1$.
    52 \FOR{$i=1$ to $m$}
    53 \STATE Shift the first $r$ column histograms $h_0, \ldots, h_{r-1}$ down 1 pixel.
    54 \STATE Combine these $r$ column histograms to form the kernel histogram $H$.
    55 \FOR{$j=1$ to $n$}
    56 \STATE Set pixel $Y_{i,j}$ equal to the median of $H$.
    57 \STATE Shift column histogram $h_{j+r}$ down 1 pixel.
    58 \STATE Remove column histogram $h_{j-r-1}$.
    59 \STATE Add column histogram $h_{j+r}$.
    60 \ENDFOR
    61 \ENDFOR
    62 \end{algorithmic}
    63 \end{algorithm}
    65 \begin{figure}
    66   \centering
    67 	\includegraphics[width=0.35\textwidth]{pupil_step_1_start}
    68 	\label{pupil_step_1_start}
    69 	\hspace{1cm}
    70 	\includegraphics[width=0.35\textwidth]{pupil_step_2_median}
    71 	\caption{Image before and after a median filter.}
    72 		\label{pupil_step_2_median}
    73 \end{figure}
    75 The overall effect of the median blur is to reduce the noise and pixel
    76 intensity complexity of the iris image without perturbing the edge fidelity of
    77 the original image. This results in a stronger clustering of pixel values in
    78 the resultant pixel data histogram; this permits a largely noise-free, dynamic
    79 analysis of features which occupy discrete pixel ranges, such as the pupil (see
    80 figure \ref{example_histogram}).
    82 \begin{figure}
    83 	\centering
    84 	\includegraphics[width=0.35\textwidth]{example_histogram}
    85 	\caption{Histogram generated in the application following a median blur of a database image.  Note the distinct, left-most (darkest pixel values) peak as a typical pupil's pixel signature on the image.}
    86 	\label{example_histogram}
    87 \end{figure}
    89 Upon application of a dynamic threshold to our median blurred image the
    90 leftmost image in figure \ref{pupil_step_3_thresh} is obtained. The location of
    91 the pupil is well defined, and edge detection can extract this information to
    92 form a single bit depth image which can be searched for circles.
    94 The Sobel filter is an edge detection technique which calculates the gradients
    95 (in both the $x$ and $y$ direction) of the image intensity for each pixel and
    96 then combines them to give a gradient magnitude. This indicates how sharp the
    97 boundary is, which is itself an indication of whether an edge is present.
    99 \begin{figure}
   100   \centering
   101 	\includegraphics[width=0.35\textwidth]{pupil_step_3_thresh}
   102 	\label{pupil_step_3_thresh}
   103 	\hspace{1cm}
   104 	\includegraphics[width=0.35\textwidth]{pupil_step_4_sobel}
   105 	\caption{Thresholded image before and after edge detection with a 2D Sobel convolution filter.}
   106 	\label{pupil_step_4_sobel}
   107 \end{figure}
   109 As can be seen in \ref{pupil_step_4_sobel} the result of this process of
   110 filtering is a circular artifact denoting the extremeties of the pupil in the
   111 image. This data requires further analysis for the presence of circular
   112 artifacts to acquire a "best-fit" representation of the potentially elliptical
   113 pupil.
   115 %TODO: make this not logbook styleeee
   116 %TODO: make this focused towards pupils
   117 The Hough Transform is implemented to locate the pupil (and subsequently the
   118 iris).  Given a regular curve -- in this case a circle -- and a suitable range
   119 of parameters -- $x$ centre, $y$ centre and radius -- the generalised Hough
   120 transform can query individual pixels in an image for their contribution to a
   121 globally consistent solution for any parameters within the valid range.
   123 Due to the high computational complexity of the Hough transform technique (the
   124 parameter space for a simple circle is in three dimensions and depending on
   125 valid ranges this can quickly grow to millions of computations) and previous
   126 considerations for application performance, a number of elements were
   127 streamlined to accelerate the computation.
   129 Thresholding the input image has the inherent benefit of changing the virtual
   130 bit-depth of the image to 1 (black or white). Furthermore, the implemented algorithm saves
   131 time by only operating on black pixels -- a greyscale implementation would
   132 significantly add to the processing time.  Additionally a search on the
   133 threshold result (figure \ref{pupil_step_3_thresh}) gives confident parameter
   134 estimations for centre and radius which can be used to constrain and guide the
   135 Hough algorithm.
   137 \begin{figure}
   138   \centering
   139 	\includegraphics[width=0.35\textwidth]{pupil_step_5_hough}
   140 %	\label{pupil_step_5_hough}
   141 	\hspace{1cm}
   142 	\includegraphics[width=0.35\textwidth]{pupil_step_6_finish}
   143 	\caption{Hough transform solution for a circle (red), given edge detected pupil image.}
   144 	\label{pupil_step_6_finish}
   145 \end{figure}
   148 As can be seen in figure \ref{pupil_step_6_finish}, this process gives fast,
   149 accurate and reliable solutions for circular artefacts in the thresholded and
   150 edge detected pupil image.  The basis of iris auto-detection relies on this
   151 process.
   153 \subsection{Iris}
   155 Once we have the location of the pupil clearly defined, the complexity of
   156 locating the iris is somewhat reduced due to the relative concentricity of the
   157 pupil and iris. 
   159 In contrast to pupil detection, efficiently locating the iris is somewhat more
   160 complicated due to (i) the obstruction of the iris by the eyelids for most
   161 eyes, (ii) its irregular pattern and (iii) because of the relative similarity with the iris
   162 boundary and the sclera.
   164 Nevertheless, following a similar method to that of locating the pupil generally provides good results.
   165 Again, a median filter is applied to the original image in order to remove noise from the image and to
   166 strengthen the clustering in the pixel histogram. The resultant histogram is used to identify a point
   167 that can be used to threshold the image to produce an image similar to figure \ref{iris-threshold}.
   169 The method involves finding the peaks of the histogram (excluding the left-most peak that represents the pupil)
   170 and choosing a treshold value between these points. Although not always optimum, this method generally creates
   171 an image that the Hough Transform can use to find an approximate match for the iris; other, potentially better,
   172 methods for achieving this threshold are discussed in the future work section (\ref{furtherwork}).
   174 \begin{figure}
   175   \centering
   176 	\includegraphics[width=0.4\textwidth]{iris-threshold}
   177   \caption{	An example of an image that is obtained by intelligently thresholding the original input image and then using Sobel to edge detect.
   178 						This image represents the data that is used to locate the iris; the data is used by the Hough Transform to decide
   179 						on a best-fit circle to represent the location of the iris. A rough semi-circle can be made out by the human eye; this is what
   180 						allows the Hough Transform to accurately locate the iris. }
   181     \label{iris-threshold}
   182 \end{figure}
   184 The Hough transform is very efficient for the task of finding the iris from an image such
   185 as this because it is resilient to noise
   186 and performs well even when a large amount of the circle is hidden. For
   187 example, when two eyelids are covering a large portion of the iris, such
   188 as in figure \ref{hough-eyelids}. This provides us with a circle that defines the iris, although
   189 portions of this are still obscured by the eyelids, meaning that these also need to be defined.
   192 \begin{figure}
   193   \centering
   194 	\includegraphics[width=0.5\textwidth]{hough-eyelids}
   195   \caption{	A demonstration of the results obtained by applying the Hough Transform to an image such as that in figure
   196 						\ref{iris-threshold}. The chosen circle is shown in red, which is an average of all of the circles with the 
   197 						maximum number of votes (in green). The blue circles represent those with the second most votes.}
   198     \label{hough-eyelids}
   199 \end{figure}
   202 \subsection{Eyelids}
   203 We note that an image of an eye roughly has three intensities, from darkest to
   204 lightest: pupil, iris and sclera, and eyelids. Hence a suitable thresholding
   205 method should be able to separate the eyelids from the rest of the image. Otsu
   206 gives a method in \cite{otsu1975} for appropriately thresholding grey level
   207 images; the algorithm works as follows. 
   209 Assume we have an image with $L$ possible intensities for each pixel and $n_i$
   210 is the number of pixels of intensity $i$. We first normalize the histogram into
   211 a probability distribution, that is $p_i = n_i / \sum_{i=1}^{L} n_i$. We assume
   212 that we threshold the image at level $k$ into two classes, $C_0$ containing
   213 levels less than or equal to $k$ and $C_1$ containing the rest. We then define
   214 $$\omega(k) := \sum_{i=1}^{k} p_i \text{ and } \mu(k) := \sum_{i=1}^{k} ip_i,$$
   215 which are the probability of a pixel picked at random being in $C_0$ and the expected (mean) intensity for $C_0$ respectively.
   217 It is clear to see that that $1-\omega(k)$ is the probability of a pixel being in $C_1$ and $\mu(L)$ is the mean intensity of the entire image. We then calculate
   218 $$\sigma^2 = \frac{\left[\mu(L)\omega(k) - \mu(k)\right]^2}{\omega(k)[1 - \omega(k)},$$
   219 which is shown to be measure of the variance between the two classes $C_0$ and $C_1$ by Otsu in \cite{otsu1975}. We calculate this between-class variance for each $k$ and finally threshold on the $k$ where $\sigma^2$ maximized.
   221 To get any useful information from Otsu thresholding we must first ignore the
   222 pupil to ensure that the result does not only consist of two classes (the pupil
   223 and then everything else).
   224 Once we have ignored the pupil in class calculations we find that Otsu thresholding
   225 gives us two classes: one containing the iris, sclera and the pupil (since it
   226 has lowest intensity), and the other class containing the eyelids, see figure
   227 \ref{otsu-eyelids}.
   228 With the iris located within the image we must now begin the task of extracting information from it.
   230 \begin{figure}
   231   \centering
   232 	\includegraphics[width=0.35\textwidth]{anothereye}
   233 		\hspace{1cm}
   234 	\includegraphics[width=0.35\textwidth]{otsue}
   235 	\caption{Image before and after a median blur and Otsu threshold.}
   236 	\label{otsu-eyelids}
   237 \end{figure}
   240 \section{Encoding the Iris}
   241 %TODO: Reasons for unwrapping the iris for the filters. Polar coords.
   243 The iris is encoded to a unique set of 2048 bits which serve as the fundamental
   244 identification of that person's particular iris. These iris bit codes can be
   245 stored in a database and then compared to uniquely identify a person. The size
   246 of 2048 is sufficiently large to store the data of several particular filters
   247 on most angles of the iris, while also being sufficiently small to be easily
   248 stored in a database and manipulated quickly. We wish to extract phase
   249 information from the iris as opposed to amplitude information since phase
   250 information is not skewed by pupil deformation. We use Gabor filters to extract
   251 this phase information as suggested by Daugman \cite{daugman2004}.
   253 %One of the main problems with this kind of approach is the way in which we 
   254 %target the iris boundary in order for it to be 
   256 The Gabor filter is an application of the Gabor wavelet (see equation
   257 \ref{gaborEquation}) \cite{daugman2004}. This will return two bits depending on
   258 which quadrant (see figure \ref{quadrants}) the normalised resulting imaginary
   259 number from this lies in. This equation can be simplified for computation by
   260 considering it as a combination of two filters: one filter representing the
   261 real part of the integral and the other representing the imaginary part. Each
   262 of these filters consist of a combination of a Gaussian and a sinusoidal
   263 filter (see figure \ref{gabor-combo}). The parameters $\alpha$, $\beta$ and
   264 $\omega$ are then tuned by constraining the Gaussian filter to range over one
   265 standard deviation and the sinusoidal filter to range from $-\pi$ to $\pi$.
   267 \begin{equation}
   268 \label{gaborEquation}
   269 h_{\{Re,Im\}} = sgn_{\{Re,Im\}}\int _{\rho} \int _{\phi} I(\rho, \phi) e^{-i\omega(\theta _{0} - \phi)} e^{-(r_{0}-\rho)^{2}/\alpha^{2}} e^{-(\theta_{0}-\phi)^{2}/\beta^{2}} \rho d\rho d\phi
   270 \end{equation}
   272 \begin{figure}
   273 	\centering
   274 		\includegraphics[width=0.35\textwidth]{quadrants}
   275 	\caption{The four phase quadrants and the corresponding bit sequences they represent}
   276 	\label{quadrants}
   277 \end{figure}
   279 \begin{figure}
   280   \centering
   281 	\includegraphics[width=0.8\textwidth]{gabor-combo}
   282   \caption{The real (left) and imaginary (right) parts of the Gabor wavelets}
   283   \label{gabor-combo}
   284 \end{figure}
   286 %\subsection{Heatmaps}
   287 %A heatmap is created by applying a fixed size Gabor filter to every pixel in an
   288 %unwrapped iris. Then the 2 bit code for each pixel is represented as a colour
   289 %on the heatmap. By doing this we were able to see clearly which size filters
   290 %would give us the most informative data when doing comparisons. As shown by
   291 %figure \ref{heatmaps}, smaller filters produce a largely varying heatmap, and
   292 %much larger filters give significantly less useful information for comparisons.
   293 %We will therefore try to use small filters across the area closest to the pupil
   294 %to capture the large amount of detail seen here. We will then increase the
   295 %sizes of filters further out in the iris to make sure that we cover a
   296 %sufficient proportion of the iris in our comparisons, but will try to balance
   297 %this so that we don't use uninformative filters as in figure \ref{heatmaps}
   299 \subsection{Filter Placement and Size}
   300 The Gabor filters need to be placed so that they take account for a large
   301 enough range of data without missing out the fine detail within the iris. As we
   302 require 2 bits to uniquely identify each quadrature, we have the additional
   303 constraint that 1024 filters must be used, allowing a manageable bit code
   304 length of 256B for each iris which can be split into sections of 4B. To
   305 simplify the calculations involved in applying the filters to the unwrapped iris they
   306 can also be considered to all be placed at a uniform rotation of 0. Since it
   307 can be observed that there is consistent amounts of information at points
   308 varying in the angular direction in the iris, it is also a good idea to place
   309 the filters uniformly in this direction. In this implementation 256 angular
   310 directions were considered because it divides our
   311 required number of filters evenly and includes a large spread of data across
   312 the iris. However, in many cases, in the radial direction there is more
   313 distinct information observed closer to the pupil.
   315 To include the more important data close to the pupil the filters can be
   316 positioned so that a larger proportion of them have their centres in this
   317 range. With the 256 angular directions there are 4 radial filter locations left
   318 to position. These can be placed uniformly between the bottom of the iris and
   319 half-way across the iris to allow the data around the pupil to be explored.
   320 Finally, to make sure that filters are always large enough to represent enough
   321 data, the constraint that a filter centre cannot be placed on the 3 pixels
   322 nearest the top and bottom of the image is added. This means that useless
   323 filters of sizes 1 and 3 will never be included when encoding the iris, and we
   324 avoid including additional pupil debris that may have been incorporated into
   325 the iris section. The final set of filter locations across an unwrapped iris is
   326 represented in figure \ref{filterpos}.
   328 \begin{figure}[h!]
   329   \centering
   330 	\includegraphics[width=0.8\textwidth]{filterpos}
   331   \caption{Filter positions for an iris of radius 40 between $\theta$=0 and $\theta$=45  }
   332   \label{filterpos}
   333 \end{figure}
   335 The final consideration with the filters is their size. Two schemes for doing
   336 this are demonstrated in figure \ref{filter-size}. The first (left) aims to
   337 position the largest possible filter at every filter location. This means that
   338 the filters fulfil one of the original requirements; that the filters should
   339 capture as much information as they can. The issue which this scheme suffers
   340 from is that the filters with their centres in the middle of the iris can be so
   341 large that they aren't really able to represent some of the finer detail in the
   342 image. The second scheme (right) tackles this problem
   343 by capping all filters in the image at a maximum size. This maximum size can be
   344 tuned in the implementation to get more distinct bitcodes, which is why it is
   345 favourable and consequently the one we use.
   346 \begin{figure}
   347 	\centering
   348 		\includegraphics[width=0.9\textwidth]{filter-size}
   349 	\caption{Two schemes for filter sizes and placements at a fixed point in the angular direction in an iris of radius 40. Left: maximum size filter placed. Right: filter sizes capped at 25}
   350 	\label{filter-size}
   351 \end{figure}
   353 \section{Comparisons}
   354 Iris encodings can be compared by computing a Hamming distance between them. 
   355 The Hamming distance between two iris codes is calculated as given in \cite{daugman2004}. 
   356 Assume we have two iris codes, $codeA$ and $codeB$, with corresponding bit masks, 
   357 $maskA$ and $maskB$ (where a 0 in the bit mask corresponds to a bad bit in that position 
   358 in the iris code). Then we calculate the hamming distance $H$ as:
   360 $$H = \frac{\|(codeA \otimes codeB) \cap maskA \cap maskB\|}{\|maskA \cap maskB\|}. $$
   362 Where $\otimes$ is the bitwise XOR operator and $\cap$ is the bitwise AND operator.
   364 We can see that the numerator will be the number of differences between the
   365 mutually non-bad bits of $codeA$ and $codeB$ and that the denominator will be
   366 the number of mutually non-bad bits. However an astute reader may observe that
   367 this measure can fail if there is an exceedingly low number of mutually non-bad bits
   368 causing the numerator to be zero even though there may actually be large differences
   369 between $codeA$ and $codeB$. However, this is a simple case to handle and will rarely
   370 occur when comparing iris images.
   372 This comparison is performed for various shifts in the codes to check for
   373 rotation of the image, as it is very possible that a subject's eye in the image
   374 is rotated if the camera or their head is tilted slightly.
   376 This statistical independence test is the key to iris recognition. It provides
   377 sufficiently many degrees-of-freedom to fail in every case where the iris
   378 bitcodes of two different eyes are compared, but uniquely passed when the same
   379 iris is compared with itself.
   381 \subsection{Iris Database}
   382 Since we are able to generate the code and the iris mask directly, these are
   383 the only two items that are required to be stored in a database, the image
   384 bitmaps themselves are no longer required.
   386 We are currently using a basic text file as our database as this was simple to
   387 implement and for the small number of eyes we have as test subjects this is
   388 perfectly adequate and proves to be able to compare iris bitcodes extremely
   389 fast.
   391 \section{Graphical User Interface}
   392 Since the software was built in Qt, a cross-platform toolkit, the final product
   393 runs on Linux, Windows, and Mac desktops, adopting the operating system's
   394 native toolkit and giving it an integrated feel on all platforms.
   396 \begin{figure}
   397   \centering
   398 	\includegraphics[width=0.8\textwidth]{kde}
   399   \caption{Linux with Open File view.}
   400 \end{figure}
   401 \begin{figure}
   402   \centering
   403 	\includegraphics[width=0.45\textwidth]{windows}
   404 	\includegraphics[width=0.45\textwidth]{mac}
   405 	\caption{Windows and Mac OS X.}
   406 \end{figure}
   409 Usability is an important consideration when creating an application, and
   410 throughout development the aim was to provide a clean and intuitive interface
   411 whilst still providing full usability and advanced features. The key
   412 intellectual challenges for the user interface have been primarily to provide a
   413 straight-forward and informative access point for the user, but also to provide
   414 relevant extra data when required with a fall-back mechanism for difficulties
   415 in auto-detection.
   417 \begin{figure}
   418   \centering
   419 	\includegraphics[width=0.5\textwidth]{heatmapapp}
   420   \caption{Application with heatmap tab.}
   421 \end{figure}
   423 The final product includes relevant data and feedback behaviour such as
   424 histogram information from the image to allow the advanced user to track-down
   425 and diagnose auto-detection failures. This information is placed into a tabbed
   426 interface below the primary image so that it is easily accessible without
   427 encroaching on our clean interface which allows the average user
   428 point-and-click refinement (if necessary) of the iris parameters with real-time
   429 visual feedback without overloading them with this more advanced information.

mercurial