How to Build a Speech Recognition System for the Sinhala language with Kaldi (Part 1)

Hirunika Karunathilaka
5 min readApr 5, 2020

Hello !!!. This tutorial series is a walkthrough about how we can develop a speech recognition system for the Sinhala language. I will try to address each and every issue I came across and include the references I followed.

Introduction

Sinhala is a low resource language and at the same time, it is a language with a rich lexical variety. So, if you are thinking to develop an ASR (Automatic Speech Recognition) system for any low resource language, this tutorial series would help.

I am using the Kaldi Speech Recognition toolkit. They also have very good documentation (https://kaldi-asr.org/doc/ ). At first, it may look complex, but it becomes really helpful later.

Kaldi is an open-source toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. It aims to provide software that is flexible and extensible. Kaldi tools support CUDA processing and other distributed parallel processing such as Grid Engine.

Another most interesting fact is Kaldi has a Google group (https://groups.google.com/forum/#!forum/kaldi-help) where you can ask (valid :-P ) questions you come across when working with Kaldi. The authors actually answer the questions with a very short time, which I found very helpful.

Let’s get started.

Download & Install Kaldi

I downloaded and installed Kaldi on my Ubuntu 16.04 LTS. These are the quick steps you need to follow.

1. First, you need to clone the Kaldi GitHub project to your pc. This might take some minutes.

$ git clone https://github.com/kaldi-asr/kaldi.git

Once the cloning is finished, go inside the Kaldi folder. There, the INSTALL file explains the installation steps. Cat the file and see what's included.

$ cat INSTALL

2. Now cd to tools folder and install the most important prerequisites for Kaldi. First, check the dependencies like below and install them.

$ extras/check_dependencies.sh

Then run,

$ make

This takes a lot of time to finish. If you have multiple CPUs, you can speed this process by giving the number of CPUs to work parallelly. You can set as below to use 5 CPUs.

$ make -j 5

Now again check the dependencies, and it will say “all ok”.

Then go to kaldi/src folder and configure as below,

$ ./configure --shared

Kaldi Folder Structure

So, now the installation part is over and here is how the folder structure looks like,

Inside the egs folder, you will find various sample projects that have been done and of course you can have a look around and get ideas. So yes, our project folder also resides under this egs folder.

For this tutorial, We’ll just set up our folder structure for our project and in the next tutorial, we’ll create the basic scripts.

Go inside the egs folder and create your folder there. My one is “SinhalaASR”.

$ cd kaldi/egs$ mkdir SinhalaASR

Go inside your folder and execute the following commands to soft link the required Kaldi scripts to your project. wsj (famous Wall Street Journal)is one of the sample projects under the egs folder.

$ cd SinhalaASR
$ ln -s ../wsj/s5/steps .
$ ln -s ../wsj/s5/utils .
$ ln -s ../../src .

$ cp ../wsj/s5/path.sh .

After that’s done, you will see some folders have been created inside your project.

We have to modify the path.sh bash file since we are not going to create another folder inside the SinhalaASR. (In wsj, they have a s5 folder inside wsj.)

It is better you install a good text editor to edit bash files. I’m using Visual Studio Code

Change the path to KALDI_ROOT as below. I have commented on the script as well.

# Defining Kaldi root directory
export KALDI_ROOT=`pwd`/../..
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh# Setting paths to useful tools
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1. $KALDI_ROOT/tools/config/common_path.sh# Variable needed for proper data sortingexport LC_ALL=C

The next few steps in this tutorial will explain how to create other required folders for the project.

Go inside SinhalaASR and create folders to store experiments, configurations, and data. Inside the data folder, we store our speech data relevant to the training set (train), validation set (dev), testing set(test), local information related to our speech data (local) and information related to speech corpus (lang).

cd SinhalaASR
mkdir exp
mkdir conf
mkdir data
cd SinhalaASR/data
mkdir train
mkdir dev
mkdir test
mkdir local
mkdir lang

Good job :-D. For this tutorial, that’s all folks. See my second tutorial about how to prepare the language and acoustic resources for the project.

Special thanks go to University of Colombo School of Computing LTRL for providing the speech corpus, language resources, and computational resources

References

--

--