SphinxBase is the repository for fundamental data structures and algorithms used by the Sphinx speech recognition system as a whole (i.e. the decoders and trainer).

== What goes in SphinxBase ==

Things that should be reasonably shared between different decoders.  These are the decoders that we care about at the present time:

* [[PocketSphinx]] - the fastest decoder, uses a single bigram lexicon tree with multi-pass forward search
* [[FlatSphinx]] a.k.a. "Sphinx 3.0" - the most accurate decoder, uses full trigrams and triphones in flat lexicon search
* [[TreeSphinx]] a.k.a. "Sphinx 3.x" - the best general-purpose large vocabulary decoder, uses multiple lexicon trees

Currently, [[PocketSphinx]] and [[FlatSphinx]] are priorities due to our current research projects, which are concentrated in dialog systems (sub-real-time, multi-pass search) and off-line transcription (time is no object).

=== Basic data structures ===

* Strings

* Lists

* Hash Tables

* Heaps (priority queues)

=== Speech data structures ===

* Pronunciation Dictionaries

* N-gram Language Models

* Gaussian Mixture Models

* Backpointer Tables

* Word Graphs

=== Basic functions and algorithms ===

* Basic File I/O (compression/decompression)
* Structured File I/O
** Model parameter files
** Acoustic feature files
** Waveform files
** All data structures noted above
* Basic linear algebra operations on Hermitian matrices
** Addition, element-wise and matrix multiplication
** Inversion and solution of linear equations (for MLLR)
** Solution of generalized eigenvalue problems (for LDA)

=== Speech functions and algorithms ===

* Acoustic Feature Analysis and Synthesis
* Dynamic Feature Computation
* Language Model Scoring
* GMM Computation

=== What does not go in SphinxBase ===

* Search-related algorithms and data structures (e.g. lexicon trees)
** But a generic HMM implementation might, if it's fast enough
* Training-related algorithms and data structures (e.g. Forward, Backward, Viterbi algorithms, re-estimation sums)

== Roadmap ==

* Current status (March 2007): Acoustic features, dynamic features complete, basic data structures and functions are present but not really complete.  Interfaces are not consistent or nice.  [[PocketSphinx]] and [[Sphinx3]] can be hosted on it.

* Summer 2007: Complete implementation of string and linear algebra functions and structures, clean up and standardize interfaces to other basic classes.  Merge and incorporate language model and possibly generic (multi-stream) GMM computation code.

DHDWiki: SphinxBase (last edited 2007-11-07 07:13:02 by localhost)