Sphincs is a speech recognition system written entirely in the C# programming language.
The diagram below shows the general architecture of Sphincs, followed by a description of each block:

architecture.gif

Figure 1: Architecture diagram of Sphincs.

Recognizer - Contains the main components of Sphinxcs, which are the front end, the linguist, and the decoder. The application interacts with the Sphinxcs system mainly via the Recognizer.

Audio - The data to be decoded. This is audio in most systems, but it can also be configured to accept other forms of data, e.g., spectral or cepstral data.

Front End - Performs digital signal processing (DSP) on the incoming data.

Feature - The output of the front end are features, which are used for decoding in the rest of the system.

Linguist - Embodies the linguistic knowledge of the system, which are the acoustic model, the dictionary, and the language model. The linguist produces a search graph structure on which the search manager performs search using different algorithms.

Acoustic Model - Contains a representation (often statistical) of a sound, often created by training using lots of acoustic data.

Dictionary - Responsible for determining how a words is pronounced.

Language Model - Contains a representation (often statistical) of the probability of occurrence of words.

Search Graph - The graph structure produced by the linguist according to certain criteria (e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the language model.

The decoder - Contains the search manager.


Last edited Dec 1, 2014 at 8:01 PM by DxN, version 9