International Journal of Electronics and Computer Applications

Volume: 1 Issue: 1

  • Open Access
  • Original Article

Multilingual Spoken Language Recognition Using Machine Learning Algorithms

Pratik Nandkumar Gore1 , Subhash Tukaram Bawadane1, Sahil Sanjay Ingale1, S G Watve1

1 Electronic & Telecommunication, P.E.S Modern College of Engineering, Pune, India

*Corresponding author email: [email protected]

Year: 2024, Page: 15-19,

Received: Feb. 22, 2024 Accepted: May 10, 2024 Published: May 22, 2024


Machine learning algorithms are being studied to develop algorithms that can recognize and segment languages in audio recordings. This technology has great potential to improve our ability to communicate and understand different language communities. The main goal of multilingual recognition is to develop models that can accurately recognize spoken language. This is especially useful in applications such as call centers and voice assistants. Speech patterns found in online podcasts, audiobooks and its variants in Speech Corpus. This corpus contains utterances and each takes an equal time of 10 seconds. The entire corpus is divided into two parts, a large object as a training data set and a small one as a test set. Thus, an acoustic model that uses the mean values of the BFCC appears to be an appropriate method for speech recognition. The system uses Convolutional K Nearest Neighbors (KNN) to solve the multiple classification problem. The aim of the project is to know Punjabi, Hindi and Gujarati.

Keywords: Multilingual Spoken Language Recognition Using Machine Learning Algorithms


  1. Hieronymous and Kadambe proposed a task independent spoken language identification which uses a Large Vocabulary Automatic Speech Recognition (LVASR)

  2. Rao L. Multiclass Spoken Language Identification for Indian Languages using Deep Learning .

  3. Das, Shekhar H, Roy P. A deep dive into Deep learning techniques for solving spoken language identification problems. (pp. 81-100) Academic press. 2019.

  4. Sharma N, Jain V, Mishra A. An analysis of CNN for Image classificationProcedia computer science. 2018;132:377–384.

  5. Kaz Z. Sentence Level Language Identification in Gujarati, hindi . .

  6. Kim H, Park JS. Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech RecognitionApplied Sciences. 10(7).

  7. Padi B, Mohan A, Ganapathy S. Towards Relevance and Sequence Modeling in Language RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020;28:1223–1232.

  8. Verma M, Buduru AB. Fine-grained Language Identification with Multilingual CapsNet Model. In: 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM). (pp. 94-102) IEEE. 2020.

  9. Barnard E, Cole RA. Reviewing automatic languageidentificationIEEE Signal Processing Magazine. .

  10. Waibel A, Geutner P, Tomokiyo LM, Schultz T, Woszczyna M. Multilinguality in speech and spoken language systemsProceedings of the IEEE. 2000;88(8):1297–1313.

Cite this article

Gore PN, Tukaram Bawadane S, Sanjay Ingale S, Watve SG. (2024). Multilingual Spoken Language Recognition Using Machine Learning Algorithms. International Journal of Electronics and Computer Applications. 1(1): 15-19.