mixturemodel

microscopes.mixture.definition

class microscopes.mixture.definition.model_definition

Structural definition for a dirichlet process mixture model

Parameters:

n : int

Number of observations

models : iterable of model descriptors

The component likelihood models. Each element is either x or (x, y), where x is a model_descriptor and y is a dict containing the hyperpriors. If y is not given, then the default hyperpriors are used per model.

cluster_hyperprior : dict, optional

Describes the hyperior for the CRP

Notes

This class is not meant to be sub-classable.

cluster_hyperprior(self)
hyperpriors(self)
models(self)
n(self)

microscopes.mixture.model

class microscopes.mixture.model.state

The underlying state of a Dirichlet Process mixture model.

You should not explicitly construct a state object. Instead, use initialize.

add_value(self, int gid, int eid, y, rng r)
assignments(self)
create_group(self, rng r)
dcheck_consistency(self)
delete_group(self, int gid)
empty_groups(self)
get_cluster_hp(self)
get_feature_dtypes(self)
get_feature_hp(self, int i)
get_feature_types(self)
get_suffstats(self, int gid, int fid)
groups(self)
groupsize(self, int gid)
is_group_empty(self, int gid)
nentities(self)
nfeatures(self)
ngroups(self)
remove_value(self, int eid, y, rng r)
sample_post_pred(self, y_new, rng r)
score_assignment(self)
score_data(self, features, groups, rng r)
score_joint(self, rng r)
score_value(self, y, rng r)
serialize(self)
set_cluster_hp(self, dict raw)
set_feature_hp(self, int i, dict d)
set_suffstats(self, int gid, int fid, dict d)
microscopes.mixture.model.initialize(model_definition defn, abstract_dataview data, rng r, **kwargs)

Initialize state to a random, valid point in the state space

Parameters:

defn : model definition

data : recarray dataview

rng : random state

microscopes.mixture.model.bind(state s, abstract_dataview data)
microscopes.mixture.model.deserialize(model_definition defn, bytes)

Restore a state object from a bytestring representation.

Note that a serialized representation of a state object does not contain its own structural definition.

Parameters:

defn : model definition

bytes : bytestring representation

microscopes.mixture.runner

Implements the Runner interface for mixture models

microscopes.mixture.runner.default_assign_kernel_config(defn)

Creates a default kernel configuration for sampling the assignment (clustering) vector. The default kernel is currently a gibbs sampler.

Parameters:defn : mixturemodel definition
microscopes.mixture.runner.default_cluster_hp_kernel_config(defn)

Creates a default kernel configuration for sampling the clustering (Chinese Restaurant Process) model hyper-parameter. The default kernel is currently a one-dimensional slice sampler.

Parameters:

defn : mixturemodel definition

The hyper-priors set in the definition are used to configure the hyper-parameter sampling kernels.

microscopes.mixture.runner.default_feature_hp_kernel_config(defn)

Creates a default kernel configuration for sampling the component (feature) model hyper-parameters. The default kernel is currently a one-dimensional slice sampler.

Parameters:

defn : mixturemodel definition

The hyper-priors set in the definition are used to configure the hyper-parameter sampling kernels.

microscopes.mixture.runner.default_grid_feature_hp_kernel_config(defn)

Creates a default kernel configuration for sampling the component (feature) model hyper-parameters via gridded gibbs.

Parameters:

defn : mixturemodel definition

The hyper-priors set in the definition are used to configure the hyper-parameter sampling kernels.

microscopes.mixture.runner.default_kernel_config(defn)

Creates a default kernel configuration suitable for general purpose inference. Currently configures an assignment sampler followed by a component hyper-parameter sampler.

Parameters:defn : mixturemodel definition
class microscopes.mixture.runner.runner(defn, view, latent, kernel_config)

The dirichlet process mixture model runner

Parameters:

defn : model_definition

The structural definition.

view : a recarray dataview

The observations.

latent : state

The initialization state. Note that a copy of latent is made. Use get_latent() to access the modified state.

kernel_config : list

A list of either x strings or (x, y) tuples, where x is a string containing the name of the kernel and y is a dict which configures the particular kernel. In the former case where y is omitted, then the defaults parameters for each kernel are used.

Possible values of x are: {assign’, ‘assign_resample’, ‘grid_feature_hp’, ‘slice_feature_hp’,

‘slice_cluster_hp’}

expensive_state
expensive_state_digest(h)
get_latent()

Returns the current value of the underlying state object.

Note that the returned value is a copy, so modifications to it will not be seen by the runner.

run(r, niters=10000)

Run the specified mixturemodel kernel for niters, in a single thread.

Parameters:

r : random state

niters : int

microscopes.mixture.query

The query interface for mixturemodels.

Note that the methods of this interface all take a list of latent state objects (as opposed to a single latent).

microscopes.mixture.query.posterior_predictive(q, latents, r, samples_per_chain=1)

Generate a bag of samples from the posterior distribution of each mixturemodel state object.

Parameters:

q : (N,) masked recarray

The query object

latents : list of mixturemodel latent objects

r : random state

samples_per_chain : int, optional

Default is 1.

Returns:

samples : (N, M) recarray

where M = len(latents) * samples_per_chain

Notes

If N=1, the resultng samples will not be collasped into a (M,) shape recarray for consistency purposes.

microscopes.mixture.query.posterior_predictive_statistic(q, latents, r, samples_per_chain=1, merge='avg')

Sample many values and combine each feature independently using the given merge strategy.

Parameters:

q : (N,) masked recarray

The query object

latents : list of mixturemodel latent objects

r : random state

samples_per_chain : int, optional

Default is 1.

merge : str or list of strs, each str is one of {‘avg’, ‘mode’}

Note that ‘mode’ only works for discrete data types.

Returns:

statistic : (N,) recarray

Notes

This method exists as a convenience, primarily because ndarray methods such as mean() do not work with recarrays.

microscopes.mixture.query.zmatrix(latents)

Compute a z-matrix (cluster co-assignment matrix). The ij-th entry of a z-matrix is a real value scalar between [0, 1] indicating the frequency of how often entities i and j appear in the same cluster.

Parameters:

latents : list of mixturemodel latent objects

The latents should all be points in the state space of the same structural model. The implementation currently does not check for this.

Returns:

zmat : (N, N) ndarray

Notes

Currently does not support a sparse zmatrix representation, so only use this for small N.

Datamicroscopes is developed by Qadium, with funding from the DARPA XDATA program. Copyright Qadium 2015.