This study presents an ensemble-based framework for speaker identification us- ing MFCC features extracted from an emotional speech corpus. Speaker identification is performed separately for each emotion as well as on the combined dataset to examine how emotional variability influences speaker discriminative acoustic cues. Three classical classifiers, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Random Forests (RF), are integrated through a meta-classifier-based decision fusion strategy, where an SVM meta-learner combines the complementary decision boundaries learned by the base models. By aggregating classifier decisions rather than raw feature representations, the proposed fusion mechanism enhances robustness against emotional variations and strengthens class separation in the speaker space. The system is evaluated using accuracy, weighted and macro- averaged precision, recall, F1-scores, and confusion matrices, providing a comprehensive assessment of model behavior under different emotional conditions. The fusion framework demonstrates strong performance, achieving an accuracy of 96.17%, highlighting its effectiveness in capturing reliable speaker-discriminative patterns across emotional contexts. Further analysis using the Friedman and Nemenyi post- hoc tests statistically validates the significance of performance differences among the individual classifiers and the fused ensemble, confirming the superiority of the proposed decision-fusion approach for emotion-resilient speaker identification.
Debasis Mohanta and Jainath Yadav
343-355
10.5281/zenodo.17993927