How to Build a Speech Recognition System for the Sinhala language with Kaldi (Part 3)

5 min readApr 5, 2020

Hello !!!. This tutorial series is a walkthrough about how we can develop a speech recognition system for the Sinhala language. I will try to address each and every issue I came across and include the references I followed.

This is the 3rd tutorial of the series and I hope you have followed my previous tutorials (Part 1, Part 2).

In this tutorial, we will create our first running script and build a simple ASR system.

Special thanks go to University of Colombo School of Computing LTRL for providing the speech corpus, language resources, and computational resources

Create Scripts

Inside our main folder which is SinhalaASR, we have to add 2 more bash files other than the path.sh which we created in the tutorial1. They are basically the cmd.sh and run.sh.

cmd.sh maintains how the jobs should run (run.pl or queue.pl). Since we are running in our local environment you can set training, decoding, and Cuda to run.pl as below.

#cmd.sh# Setting local system jobs (local CPU - no external clusters)
export train_cmd=run.pl
export decode_cmd=run.pl
export cuda_cmd=run.plif [[ "$(hostname -f)" == "*.fit.vutbr.cz" ]]; then
queue_conf=$HOME/queue_conf/default.conf # see example /homes/kazi/iveselyk/queue_conf/default.conf,
export train_cmd="queue.pl --config $queue_conf --mem 2G --matylda 0.2"
export decode_cmd="queue.pl --config $queue_conf --mem 3G --matylda 0.1"
export cuda_cmd="queue.pl --config $queue_conf --gpu 1 --mem 10G --tmp 40G"
fi

Before creating the run.sh file, we have to install a Language Modeling toolkit for our project. For my example, I use SRI Language Modeling Toolkit (SRILM). For that go to http://www.speech.sri.com/projects/srilm/download.html link and fill the form under the download section to download the srilm.tgz zip.

Copy the downloaded srilm.tgz file without extracting it to the kaldi/tools folder. Open the terminal inside the tools folder and install srilm using the bash file which is already included inside the tools folder as below,

./install_srilm.sh

After SRILM installation, we have to add this to our path.sh file. Include a line as below to the path.sh

# Enable SRILM
. $KALDI_ROOT/tools/env.sh

Now that’s done, let’s start writing the most important bash file — run.sh.

When running the bash files created by us, it will give the permissions denied error for executing. So just give the permissions for the files as below ,

chmod +x filename.sh

What is happening inside run.sh?

Actually all the miracles are happening inside this file. Kaldi provides us very useful scripts for developing most of the steps of an ASR pipeline.

The basic steps would be,

Prepare acoustic data
Feature extraction
Prepare language data
Create a language model
Training ( training -> decoding -> alignment)

Here is a very simple run.sh file.

. ./cmd.sh
. ./path.sh# Removing previously created data (from last run.sh execution)
#rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/dev/spk2utt data/dev/cmvn.scp data/dev/feats.scp data/dev/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txtnj=4  # split training into 4 jobsecho
echo "===== PREPARING ACOUSTIC DATA ====="
echo# Making spk2utt files
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/utt2spk_to_spk2utt.pl data/dev/utt2spk > data/dev/spk2uttecho
echo "===== FEATURES EXTRACTION ====="
echoecho "Validate and sort data if necessary"
utils/validate_data_dir.sh data/train
utils/fix_data_dir.sh data/trainutils/validate_data_dir.sh data/dev
utils/fix_data_dir.sh data/devecho "Create MFCC features"mfccdir=mfccsteps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/dev exp/make_mfcc/dev $mfccdirecho "Making cmvn.scp files"steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
steps/compute_cmvn_stats.sh data/dev exp/make_mfcc/dev $mfccdirecho "Feature extraction successful"echo
echo "===== PREPARING LANGUAGE DATA ====="
echoutils/prepare_lang.sh data/local/dict "<UNK>" data/local/lang data/langecho
echo "===== LANGUAGE MODEL CREATION ====="
echo "===== MAKING lm.arpa ====="
echoloc='which ngram-count';
if [ -z $loc ]; then
   if uname -a | grep 64 >/dev/null; then
      sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
   else
      sdir=$KALDI_ROOT/tools/srilm/bin/i686
   fi
   if [ -f $sdir/ngram-count ]; then
      echo "Using SRILM language modelling tool from $sdir"
      export PATH=$PATH:$sdir
   else
      echo "SRILM toolkit is probably not installed.
                Instructions: tools/install_srilm.sh"
      exit 1
   fi
filocal=data/local
mkdir $local/tmpngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpaecho
echo "===== MAKING G.fst ====="
echolang=data/langarpa2fst --disambig-symbol=#0 --read-symbol-table=$lang/words.txt $local/tmp/lm.arpa $lang/G.fstecho
echo "===== MONO TRAINING ====="
echo
steps/train_mono.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono  || exit 1echo
echo "===== MONO DECODING ====="
echoutils/mkgraph.sh --mono data/lang exp/mono exp/mono/graph || exit 1steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/dev exp/mono/decodeecho
echo "===== MONO ALIGNMENT ====="
echosteps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono exp/mono_ali || exit 1echo
echo "===== TRI1 (first triphone pass) TRAINING ====="
echosteps/train_deltas.sh --cmd "$train_cmd" 2500 30000 data/train data/lang exp/mono_ali exp/tri1 || exit 1echo
echo "===== TRI1 (first triphone pass) DECODING ====="
echoutils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph || exit 1steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/dev exp/tri1/decodeecho
echo "===== TRI1 (first triphone pass) ALIGNMENT ====="
echosteps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/tri1 exp/tri1_ali || exit 1echo
echo "===== run.sh script is finished ====="
echoexitEND

Kaldi also supports scoring the decode results of the test data set (in our case it’s dev). A default scoring script can be found inside the egs/wsj/s5/local/steps/score.sh. To use this script create a folder named “local” inside our main folder “SinhalaASR”, and copy it there.

Run the script by tying ./run.sh. This will take around 2-3 hours and it depends according to your pc.

This training uses Gaussian Mixture Models (GMM) and Hidden Markow Model (HMM). This type of training is considered as GMM-HMM training. Here, are some notes about the training procedure.

The final alignments of the GMM-HMM model are taken after generating monophone HMMs and thereafter triphone HMMs.
Monophone model training and alignment: This model will also be the building block for the following triphone models. Monophone training depends on the contextual information of a single phone. Thus, monophone models do not give a good result, usually as phones sound different in different contexts.
Triphone model training and alignment: The phoneme variants in the context of the two phonemes, typically the preceding and following phones are considered in the training of the triphone model, thus ensuring a better prediction of alignments.
A pass of the alignment process is repeated after each training process to optimize the correct predictions between the text transcriptions and audio and also to make sure to have the proper latest alignments for the latest model in each stage. Standard delta+delta-delta and LDA+MLLT training algorithms are used to obtain better alignments.
Training of triphone models takes into consideration two parameters; the number of leaves in the decision tree(HMM states) and the total number of Gaussians across all states in the model for fine-tuning the model for the best alignments.

The decoded results are available inside the exp/tri1/decode/scoring/log/ folder.

Good job :-D. Now you have developed a very simple ASR system with your own data. Up to now, if you face any questions, drop a comment.

References

Kaldi: Kaldi

Kaldi is a toolkit for speech recognition, intended for use by speech recognition researchers and professionals. Find…

kaldi-asr.org

Kaldi Tutorial

This website provides a tutorial on how to build acoustic models for automatic speech recognition, forced phonetic…

www.eleanorchodroff.com

How to Build a Speech Recognition System for the Sinhala language with Kaldi (Part 3)

Create Scripts

What is happening inside run.sh?

Kaldi: Kaldi

Kaldi is a toolkit for speech recognition, intended for use by speech recognition researchers and professionals. Find…

Kaldi Tutorial

This website provides a tutorial on how to build acoustic models for automatic speech recognition, forced phonetic…

SRILM Download

Downloading and Building SRILM Building SRILM SRILM is available only in source form. We cannot offer precompiled…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Hirunika Karunathilaka

Responses (2)