SphinxBase is the repository for fundamental data structures and algorithms used by the Sphinx speech recognition system as a whole (i.e. the decoders and trainer). == What goes in SphinxBase == Things that should be reasonably shared between different decoders. These are the decoders that we care about at the present time: * [[PocketSphinx]] - the fastest decoder, uses a single bigram lexicon tree with multi-pass forward search * [[FlatSphinx]] a.k.a. "Sphinx 3.0" - the most accurate decoder, uses full trigrams and triphones in flat lexicon search * [[TreeSphinx]] a.k.a. "Sphinx 3.x" - the best general-purpose large vocabulary decoder, uses multiple lexicon trees Currently, [[PocketSphinx]] and [[FlatSphinx]] are priorities due to our current research projects, which are concentrated in dialog systems (sub-real-time, multi-pass search) and off-line transcription (time is no object). === Basic data structures === * Strings * Lists * Hash Tables * Heaps (priority queues) === Speech data structures === * Pronunciation Dictionaries * N-gram Language Models * Gaussian Mixture Models * Backpointer Tables * Word Graphs === Basic functions and algorithms === * Basic File I/O (compression/decompression) * Structured File I/O ** Model parameter files ** Acoustic feature files ** Waveform files ** All data structures noted above * Basic linear algebra operations on Hermitian matrices ** Addition, element-wise and matrix multiplication ** Inversion and solution of linear equations (for MLLR) ** Solution of generalized eigenvalue problems (for LDA) === Speech functions and algorithms === * Acoustic Feature Analysis and Synthesis * Dynamic Feature Computation * Language Model Scoring * GMM Computation === What does not go in SphinxBase === * Search-related algorithms and data structures (e.g. lexicon trees) ** But a generic HMM implementation might, if it's fast enough * Training-related algorithms and data structures (e.g. Forward, Backward, Viterbi algorithms, re-estimation sums) == Roadmap == * Current status (March 2007): Acoustic features, dynamic features complete, basic data structures and functions are present but not really complete. Interfaces are not consistent or nice. [[PocketSphinx]] and [[Sphinx3]] can be hosted on it. * Summer 2007: Complete implementation of string and linear algebra functions and structures, clean up and standardize interfaces to other basic classes. Merge and incorporate language model and possibly generic (multi-stream) GMM computation code.