Univariate Data with the Normal Inverse ChiSquare Distribution
===============================================================

One of the simplest examples of data is univariate data
Let's consider a timeseries example:
`The Annual Canadian Lynx Trappings
Dataset `__
as described by `Campbel and Walker
1977 `__ contains the number of
Lynx trapped near the McKenzie River in the Northwest Territories in
Canada between 1821 and 1934.
.. code:: python
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_context('talk')
sns.set_style('darkgrid')
.. code:: python
lynx = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/lynx.csv',
index_col=0)
.. code:: python
lynx = lynx.set_index('time')
lynx.head()
.. raw:: html

lynx 
time 

1821 
269 
1822 
321 
1823 
585 
1824 
871 
1825 
1475 
.. code:: python
lynx.plot(legend=False)
plt.xlabel('Year')
plt.title('Annual Canadian Lynx Trappings 18211934')
plt.ylabel('Lynx')
.. parsedliteral::
.. image:: normalinversechisquare_files/normalinversechisquare_4_1.png
Let's plot the kernel density estimate of annual lynx trapping
.. code:: python
sns.kdeplot(lynx['lynx'])
plt.title('Kernel Density Estimate of Annual Lynx Trapping')
plt.ylabel('Probability')
plt.xlabel('Number of Lynx')
.. parsedliteral::
.. image:: normalinversechisquare_files/normalinversechisquare_6_1.png
Our plot suggests there could be three modes in the Lynx data.
In modeling this timeseries, we could assume that the number of lynx
trapped in a given year is falls into one of :math:`k` states, which are
normally distributed with some unknown mean :math:`\mu_i` and variance
:math:`\sigma^2_i` for each state
In the case of our Lynx data
.. math:: \forall i \in [1,...,k] \hspace{2mm} p(\text{lynx trapped} \text{state} = i) \sim \mathcal{N}(\mu_i, \sigma^2_i)

Now let's consider demographics data from the Titanic Dataset
The Titanic Dataset contains information about passengers of the
Titanic.
.. code:: python
ti = sns.load_dataset('titanic')
ti.head()
.. raw:: html

survived 
pclass 
sex 
age 
sibsp 
parch 
fare 
embarked 
class 
who 
adult_male 
deck 
embark_town 
alive 
alone 
0 
0 
3 
male 
22 
1 
0 
7.2500 
S 
Third 
man 
True 
NaN 
Southampton 
no 
False 
1 
1 
1 
female 
38 
1 
0 
71.2833 
C 
First 
woman 
False 
C 
Cherbourg 
yes 
False 
2 
1 
3 
female 
26 
0 
0 
7.9250 
S 
Third 
woman 
False 
NaN 
Southampton 
yes 
True 
3 
1 
1 
female 
35 
1 
0 
53.1000 
S 
First 
woman 
False 
C 
Southampton 
yes 
False 
4 
0 
3 
male 
35 
0 
0 
8.0500 
S 
Third 
man 
True 
NaN 
Southampton 
no 
True 
Passenger age and fare are both real valued. Are they related? Let's
examine the correlation matrix
.. code:: python
ti[['age','fare']].dropna().corr()
.. raw:: html

age 
fare 
age 
1.000000 
0.096067 
fare 
0.096067 
1.000000 
Since the correlation is between the two variables is zero, we can model
these two real valued columns independently.
Let's plot the kernel density estimate of each variable
.. code:: python
sns.kdeplot(ti['age'])
plt.title('Kernel Density Estimate of Passenger Age in the Titanic Datset')
.. parsedliteral::
.. image:: normalinversechisquare_files/normalinversechisquare_12_1.png
.. code:: python
sns.kdeplot(ti['fare'])
plt.title('Kernel Density Estimate of Passenger Fare in the Titanic Datset')
.. parsedliteral::
.. image:: normalinversechisquare_files/normalinversechisquare_13_1.png
Given the long tail in the fare price, we might want to model this
variable on a log scale:
.. code:: python
ti['logfare'] = np.log(ti['fare'])
ti[['age','logfare']].dropna().corr()
.. raw:: html

age 
logfare 
age 
1.000000 
0.135352 
logfare 
0.135352 
1.000000 
Again, ``logfare`` and ``age`` have near zero correlation, so we can
again model these two variables independently
Let's see what a kernel density estimate of log fare would look like
.. code:: python
sns.kdeplot(ti['logfare'])
plt.title('Kernel Density Estimate of Log Passenger Fare in the Titanic Datset')
.. parsedliteral::
.. image:: normalinversechisquare_files/normalinversechisquare_17_1.png
In logspace, passenger fare is multimodal, suggesting that we could
model this variable with a normal distirbution
If we were to model the passenger list using our Mixture Model, we would
have separate likelihoods for ``logfare`` and ``age``
.. math:: \forall i \in [1,...,k] \hspace{2mm} p(\text{logfare}\text{cluster}=i)=\mathcal{N}(\mu_{i,l}, \sigma^2_{i,l})
.. math:: \forall i \in [1,...,k] \hspace{2mm} p(\text{age}\text{cluster}=c)=\mathcal{N}(\mu_{i,a}, \sigma^2_{i,a})

Often, real value data is assumed to be normally distributed.
To learn the latent variables, :math:`\mu_i` :math:`\sigma^2_i`, we
would use a normal inversechisquare likelihood
The normal inversechisquare likelihood is the conjugate univariate
normal likelihood in data microscopes. We also have normal likelihood,
the normal inversewishart likelihood, optimized for multivariate
datasets.
It is important to model univariate normal data with this likelihood as
it acheives superior performance on univariate data.
In both these examples, we found variables that were amenable to being
modeled as univariate normal:
1. Univariate datasets
2. Datasets containing real valued variables with near zero correlation
To import our univariate normal inversechisquared likelihood, call:
.. code:: python
from microscopes.models import nich as normal_inverse_chisquared