Multilingual Sentiment Analysis Using a Single LSTM Architecture: A Comparative Study on Turkish and English Datasets

Asım Selvi; Emre Satır

JOURNAL OF INTELLIGENT SYSTEMS WITH APPLICATIONS

Year: 2025, Volume: 8, Number: 2
Published : Jan 18, 2026

Multilingual Sentiment Analysis Using a Single LSTM Architecture: A Comparative Study on Turkish and English Datasets

Asım Selvi ⁽¹⁾, Emre Satır ⁽²⁾

(1) Department of Software Engineering, Izmir Katip Celebi University, Izmir, Turkey 2Department of Computer Engineering, Izmir Katip Celebi University, Izmir

(2) Department of Computer Engineering, Izmir Katip Celebi University, Izmir

Fulltext View | Download

Abstract

Sentiment analysis has become an essential task in Natural Language Processing (NLP), particularly with the growing availability of multilingual textual data. While most studies in the literature focus on monolingual models trained separately for each language, the present study proposes a unified deep learning–based framework that performs sentiment analysis on both Turkish and English texts using a single LSTM architecture. Two datasets were employed: the publicly available IMDb movie review dataset for English and a manually labeled dataset consisting of approximately 2,000 Turkish sentences. Texts in both languages were preprocessed, tokenized, and transformed into fixed-length vector representations through embedding and LSTM layers, and binary sentiment classification was performed using a sigmoid activation function. Experimental results demonstrate that the model achieves high accuracy on the English dataset, benefiting from its large and well-balanced structure, while comparatively lower generalization performance is observed for the Turkish dataset due to its smaller size and limited domain coverage. The findings highlight the importance of dataset scale and linguistic characteristics in multilingual sentiment analysis and show that LSTM-based architecture provides an effective baseline for bilingual sentiment classification. Future work will focus on expanding Turkish data resources and integrating transformer-based multilingual models to improve performance across morphologically rich languages.