research
overview
The bulk of my graduate work is loosly on
completing partially supplied
data sets. This is exceptionally important to fields like biology and
medicine where data is costly and difficult to come by, e.g., drug discovery
research. This is also important to commerce, e.g., the
Netflix problem. I have worked on developing new algorithms
to solve certain instances of this problem. Additionally, I have worked
on theory that attempts to understand the mathematical limits of a certain
special case.
mathematical description
In mathematics, this problem is refered to as
matrix completion.
In order to describe this, let us divide the problem into two classes, the first
and simplist being
low-rank matrix completion (LRMC). In LRMC,
there is an $m\times n$ matrix $\mathbf{X}$ of a given rank $r$, however,
we are able to view only a subset of entries, denoted by a mask $\Omega$,
a matrix of 1's and 0's where 1 denotes the location of an observed
entry and 0 denotes otherwise. This observed matrix is denoted by
$X^{\Omega}$. For example, we may be given
$$
X^{\Omega} = \begin{bmatrix}
1 & 2 & \ast\\
\ast & 6 & 3\\
\end{bmatrix},
\qquad \text{where} \qquad
\Omega = \begin{bmatrix}
1 & 1 & 0\\
0 & 1 & 1\\
\end{bmatrix}.
$$
The question now is, what should the missing values of $X$ be? The
answer to this depends on the value of the assumed rank $r$, i.e., if
the rank is assumed to be 2, the missing entries can be any value. If
we assume, however, that the rank should be 1, then it is possible to
show that $X$ must be precisely
$$
X = \begin{bmatrix}
1 & 2 & 1\\
3 & 6 & 3\\
\end{bmatrix}.
$$
This is prehaps not surprising since the number of equations that must
be satisfied by $X$ is small and easy to solve by hand. If the values of
$m$, $n$, and $r$ are large, and the percent of missing data moderate
to large, the problem is infeasible to solve by hand and algorithms
must be developed to find or approximate the answer. There already
exist many algorithms to do this. My research on LRMC was rather focused
on which patterns $\Omega$ lead to partially observed data $X^{\Omega}$
with a finite number of solutions, as in the example above. This question
is linked to the very difficult problem in algebraic combinatorics of
characterizing the bases of the algebraic matroid of determinental varities.
Many attempts at solving this problem have been made.
Write
high-rank matrix completion section.