Jeremy Johnson papers & code research projects hobbies

research





overview

The bulk of my graduate work is loosly on completing partially supplied data sets. This is exceptionally important to fields like biology and medicine where data is costly and difficult to come by, e.g., drug discovery research. This is also important to commerce, e.g., the Netflix problem. I have worked on developing new algorithms to solve certain instances of this problem. Additionally, I have worked on theory that attempts to understand the mathematical limits of a certain special case.



mathematical description

In mathematics, this problem is refered to as matrix completion. In order to describe this, let us divide the problem into two classes, the first and simplist being low-rank matrix completion (LRMC). In LRMC, there is an $m\times n$ matrix $\mathbf{X}$ of a given rank $r$, however, we are able to view only a subset of entries, denoted by a mask $\Omega$, a matrix of 1's and 0's where 1 denotes the location of an observed entry and 0 denotes otherwise. This observed matrix is denoted by $X^{\Omega}$. For example, we may be given $$ X^{\Omega} = \begin{bmatrix} 1 & 2 & \ast\\ \ast & 6 & 3\\ \end{bmatrix}, \qquad \text{where} \qquad \Omega = \begin{bmatrix} 1 & 1 & 0\\ 0 & 1 & 1\\ \end{bmatrix}. $$ The question now is, what should the missing values of $X$ be? The answer to this depends on the value of the assumed rank $r$, i.e., if the rank is assumed to be 2, the missing entries can be any value. If we assume, however, that the rank should be 1, then it is possible to show that $X$ must be precisely $$ X = \begin{bmatrix} 1 & 2 & 1\\ 3 & 6 & 3\\ \end{bmatrix}. $$ This is prehaps not surprising since the number of equations that must be satisfied by $X$ is small and easy to solve by hand. If the values of $m$, $n$, and $r$ are large, and the percent of missing data moderate to large, the problem is infeasible to solve by hand and algorithms must be developed to find or approximate the answer. There already exist many algorithms to do this. My research on LRMC was rather focused on which patterns $\Omega$ lead to partially observed data $X^{\Omega}$ with a finite number of solutions, as in the example above. This question is linked to the very difficult problem in algebraic combinatorics of characterizing the bases of the algebraic matroid of determinental varities. Many attempts at solving this problem have been made.

Write high-rank matrix completion section.