Fast2Vec, a modified model of FastText that enhances semantic analysis in topic evolution

social media conept in word tag cloud
Social media concept in word tag cloud on white background

Author: Ayu Pertiwi, Azhari Azhari, Sri Mulyana

Combines semantic embedding and subword modeling for dynamic topic evolution (Branded name as hybrid AI-enhanced topic model)

Problem and Challenge

    Semantic topic modeling faces challenges in capturing nuanced word meanings, particularly in cases with negation, rare words, or synonymy. Standard models like LDA and DTM struggle with semantic coherence, while embedding models like Word2Vec and FastText have limitations in handling out-of-vocabulary (OOV) words or context insensitivity like shown in Figure 1.

    AD 4nXd4yvcpq XS1cej2suyx13 BqwqN4HuAJ5IMqXMCjgK5RGPrvk5UecQD9QZmYaLwnNiaPqP9fZA53GEyqSTb VTe7Jo6bBIrwGxHBDDqgvdNjNdkTq7fILW7vi3Th3pYz VyAfzov5MVDWRH2RpsN0?key=U3ZtQ0LL7t4tOxWq DuoAA
    Figure 1. Visualizing Semantic Limitations in Traditional Topic Models

    Goal of Experimentation

      To develop Fast2Vec, a hybrid word embedding model that integrates Word2Vec and FastText, aiming to enhance semantic accuracy in dynamic topic modeling. The objective is to track topic trends and evolution patterns using improved word representations.

      Methods

        Fast2Vec combines Word2Vec and FastText embeddings through weighted summation (==0.5). DTM is used to model topics over time, while UMAP and Affinity Propagation support semantic clustering. Semantic similarity is evaluated using cosine similarity, Spearman, and Pearson correlation. 

        Architecture System

          The system workflow like shown in Figure 2 includes data preprocessing, Fast2Vec embedding generation, DTM-based topic extraction, dimensionality reduction (UMAP), semantic clustering (AP), and evolution tracking via entropy analysis. This pipeline enables interpretable and adaptive topic modeling.

          AD 4nXfNL2crlK1jM5Y 3fYp5 IidW66 JHdpazoofs71eOFT02XbbSx23p4w5C1iLXhNmS0UFQc3aEw4ppruzd8ChPQCyztFJykDj SK UkzQktCOmLNx G9YaadyZA3TS1TXFUlA0Qrf1Dmx0q9q95wf4?key=U3ZtQ0LL7t4tOxWq DuoAA
          Figure 2. Architecture of System

          Results and Discussion

            Fast2Vec improves similarity by 39.64% over Word2Vec in OOV settings and outperforms FastText by 6.18%. It performs best in 7 out of 12 benchmark datasets. The model (Fig. 3) also successfully categorizes topic evolution patternsdiffusion, stability, shift, and moderate fluctuation validated through entropy-based trend analysis.

            AD 4nXdaS3PmYUiol4Lu2ZW3nvOmXXER1Q3 HYKKgQ6yWC75wWAMB7gyPwjji 983BetTTVcML9XVFC WPNUqwby03VVzzlrheaEX56LJ4NzL9GyylZusMR9Gkecrrn0nkQ3TUVweHB7 IYZfknE5dWgtOg?key=U3ZtQ0LL7t4tOxWq DuoAA
            Figure 3. Fast2Vec Outperforms on Semantic Shift and OOV

            Value Proposition

              Fast2Vec offers robust word representations that support fine-grained topic evolution tracking. Its integration of context and subword modeling makes it ideal for applications in NLP research, scientometrics, and semantic analysis over time. It bridges the gap between statistical modeling and semantic precision.

              Comments

              No comments yet. Why don’t you start the discussion?

              Leave a Reply

              Your email address will not be published. Required fields are marked *