Fast2Vec, a modified model of FastText that enhances semantic analysis in topic evolution

Author: Ayu Pertiwi, Azhari Azhari, Sri Mulyana

Combines semantic embedding and subword modeling for dynamic topic evolution (Branded name as hybrid AI-enhanced topic model)

Problem and Challenge

Semantic topic modeling faces challenges in capturing nuanced word meanings, particularly in cases with negation, rare words, or synonymy. Standard models like LDA and DTM struggle with semantic coherence, while embedding models like Word2Vec and FastText have limitations in handling out-of-vocabulary (OOV) words or context insensitivity like shown in Figure 1.

Figure 1. Visualizing Semantic Limitations in Traditional Topic Models

Goal of Experimentation

To develop Fast2Vec, a hybrid word embedding model that integrates Word2Vec and FastText, aiming to enhance semantic accuracy in dynamic topic modeling. The objective is to track topic trends and evolution patterns using improved word representations.

Methods

Fast2Vec combines Word2Vec and FastText embeddings through weighted summation (==0.5). DTM is used to model topics over time, while UMAP and Affinity Propagation support semantic clustering. Semantic similarity is evaluated using cosine similarity, Spearman, and Pearson correlation.

Architecture System

The system workflow like shown in Figure 2 includes data preprocessing, Fast2Vec embedding generation, DTM-based topic extraction, dimensionality reduction (UMAP), semantic clustering (AP), and evolution tracking via entropy analysis. This pipeline enables interpretable and adaptive topic modeling.

Results and Discussion

Fast2Vec improves similarity by 39.64% over Word2Vec in OOV settings and outperforms FastText by 6.18%. It performs best in 7 out of 12 benchmark datasets. The model (Fig. 3) also successfully categorizes topic evolution patternsdiffusion, stability, shift, and moderate fluctuation validated through entropy-based trend analysis.

Figure 3. Fast2Vec Outperforms on Semantic Shift and OOV

Value Proposition

Fast2Vec offers robust word representations that support fine-grained topic evolution tracking. Its integration of context and subword modeling makes it ideal for applications in NLP research, scientometrics, and semantic analysis over time. It bridges the gap between statistical modeling and semantic precision.