How to Build a Speech Recognition System for the Sinhala language with Kaldi (Part 3)

Hirunika Karunathilaka
5 min readApr 5, 2020

--

Hello !!!. This tutorial series is a walkthrough about how we can develop a speech recognition system for the Sinhala language. I will try to address each and every issue I came across and include the references I followed.

This is the 3rd tutorial of the series and I hope you have followed my previous tutorials (Part 1, Part 2).

In this tutorial, we will create our first running script and build a simple ASR system.

Special thanks go to University of Colombo School of Computing LTRL for providing the speech corpus, language resources, and computational resources

Create Scripts

Inside our main folder which is SinhalaASR, we have to add 2 more bash files other than the path.sh which we created in the tutorial1. They are basically the cmd.sh and run.sh.

cmd.sh maintains how the jobs should run (run.pl or queue.pl). Since we are running in our local environment you can set training, decoding, and Cuda to run.pl as below.

#cmd.sh# Setting local system jobs (local CPU - no external clusters)
export train_cmd=run.pl
export decode_cmd=run.pl
export cuda_cmd=run.pl
if [[ "$(hostname -f)" == "*.fit.vutbr.cz" ]]; then
queue_conf=$HOME/queue_conf/default.conf # see example /homes/kazi/iveselyk/queue_conf/default.conf,
export train_cmd="queue.pl --config $queue_conf --mem 2G --matylda 0.2"
export decode_cmd="queue.pl --config $queue_conf --mem 3G --matylda 0.1"
export cuda_cmd="queue.pl --config $queue_conf --gpu 1 --mem 10G --tmp 40G"
fi

Before creating the run.sh file, we have to install a Language Modeling toolkit for our project. For my example, I use SRI Language Modeling Toolkit (SRILM). For that go to http://www.speech.sri.com/projects/srilm/download.html link and fill the form under the download section to download the srilm.tgz zip.

Copy the downloaded srilm.tgz file without extracting it to the kaldi/tools folder. Open the terminal inside the tools folder and install srilm using the bash file which is already included inside the tools folder as below,

./install_srilm.sh

After SRILM installation, we have to add this to our path.sh file. Include a line as below to the path.sh

# Enable SRILM
. $KALDI_ROOT/tools/env.sh

Now that’s done, let’s start writing the most important bash file — run.sh.

When running the bash files created by us, it will give the permissions denied error for executing. So just give the permissions for the files as below ,

chmod +x filename.sh

What is happening inside run.sh?

Actually all the miracles are happening inside this file. Kaldi provides us very useful scripts for developing most of the steps of an ASR pipeline.

The basic steps would be,

  1. Prepare acoustic data
  2. Feature extraction
  3. Prepare language data
  4. Create a language model
  5. Training ( training -> decoding -> alignment)

Here is a very simple run.sh file.

. ./cmd.sh
. ./path.sh
# Removing previously created data (from last run.sh execution)
#rm -rf exp mfcc data/train/spk2utt data/train/cmvn.scp data/train/feats.scp data/train/split1 data/dev/spk2utt data/dev/cmvn.scp data/dev/feats.scp data/dev/split1 data/local/lang data/lang data/local/tmp data/local/dict/lexiconp.txt
nj=4 # split training into 4 jobsecho
echo "===== PREPARING ACOUSTIC DATA ====="
echo
# Making spk2utt files
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/utt2spk_to_spk2utt.pl data/dev/utt2spk > data/dev/spk2utt
echo
echo "===== FEATURES EXTRACTION ====="
echo
echo "Validate and sort data if necessary"
utils/validate_data_dir.sh data/train
utils/fix_data_dir.sh data/train
utils/validate_data_dir.sh data/dev
utils/fix_data_dir.sh data/dev
echo "Create MFCC features"mfccdir=mfccsteps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/dev exp/make_mfcc/dev $mfccdir
echo "Making cmvn.scp files"steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
steps/compute_cmvn_stats.sh data/dev exp/make_mfcc/dev $mfccdir
echo "Feature extraction successful"echo
echo "===== PREPARING LANGUAGE DATA ====="
echo
utils/prepare_lang.sh data/local/dict "<UNK>" data/local/lang data/langecho
echo "===== LANGUAGE MODEL CREATION ====="
echo "===== MAKING lm.arpa ====="
echo
loc='which ngram-count';
if [ -z $loc ]; then
if uname -a | grep 64 >/dev/null; then
sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
else
sdir=$KALDI_ROOT/tools/srilm/bin/i686
fi
if [ -f $sdir/ngram-count ]; then
echo "Using SRILM language modelling tool from $sdir"
export PATH=$PATH:$sdir
else
echo "SRILM toolkit is probably not installed.
Instructions: tools/install_srilm.sh"
exit 1
fi
fi
local=data/local
mkdir $local/tmp
ngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpaecho
echo "===== MAKING G.fst ====="
echo
lang=data/langarpa2fst --disambig-symbol=#0 --read-symbol-table=$lang/words.txt $local/tmp/lm.arpa $lang/G.fstecho
echo "===== MONO TRAINING ====="
echo
steps/train_mono.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono || exit 1
echo
echo "===== MONO DECODING ====="
echo
utils/mkgraph.sh --mono data/lang exp/mono exp/mono/graph || exit 1steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/mono/graph data/dev exp/mono/decodeecho
echo "===== MONO ALIGNMENT ====="
echo
steps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/mono exp/mono_ali || exit 1echo
echo "===== TRI1 (first triphone pass) TRAINING ====="
echo
steps/train_deltas.sh --cmd "$train_cmd" 2500 30000 data/train data/lang exp/mono_ali exp/tri1 || exit 1echo
echo "===== TRI1 (first triphone pass) DECODING ====="
echo
utils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph || exit 1steps/decode.sh --config conf/decode.config --nj $nj --cmd "$decode_cmd" exp/tri1/graph data/dev exp/tri1/decodeecho
echo "===== TRI1 (first triphone pass) ALIGNMENT ====="
echo
steps/align_si.sh --nj $nj --cmd "$train_cmd" data/train data/lang exp/tri1 exp/tri1_ali || exit 1echo
echo "===== run.sh script is finished ====="
echo
exitEND

Kaldi also supports scoring the decode results of the test data set (in our case it’s dev). A default scoring script can be found inside the egs/wsj/s5/local/steps/score.sh. To use this script create a folder named “local” inside our main folder “SinhalaASR”, and copy it there.

Run the script by tying ./run.sh. This will take around 2-3 hours and it depends according to your pc.

This training uses Gaussian Mixture Models (GMM) and Hidden Markow Model (HMM). This type of training is considered as GMM-HMM training. Here, are some notes about the training procedure.

The final alignments of the GMM-HMM model are taken after generating monophone HMMs and thereafter triphone HMMs.

Monophone model training and alignment: This model will also be the building block for the following triphone models. Monophone training depends on the contextual information of a single phone. Thus, monophone models do not give a good result, usually as phones sound different in different contexts.

Triphone model training and alignment: The phoneme variants in the context of the two phonemes, typically the preceding and following phones are considered in the training of the triphone model, thus ensuring a better prediction of alignments.

A pass of the alignment process is repeated after each training process to optimize the correct predictions between the text transcriptions and audio and also to make sure to have the proper latest alignments for the latest model in each stage. Standard delta+delta-delta and LDA+MLLT training algorithms are used to obtain better alignments.

Training of triphone models takes into consideration two parameters; the number of leaves in the decision tree(HMM states) and the total number of Gaussians across all states in the model for fine-tuning the model for the best alignments.

The decoded results are available inside the exp/tri1/decode/scoring/log/ folder.

Good job :-D. Now you have developed a very simple ASR system with your own data. Up to now, if you face any questions, drop a comment.

References

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Hirunika Karunathilaka
Hirunika Karunathilaka

Responses (2)

Write a response