Improving Badminton Action Recognition Using Spatio-Temporal Analysis and a Weighted Ensemble Learning Model

Author: Farida Asriani, Azhari Azhari, Wahyono

Problem and Challenge

Recognizing complex and dynamic movements in sports such as badminton remains a major challenge in human action recognition (HAR). Traditional recognition models often fail to capture the fast-paced nature of strokes, similarities in player posture, and temporal dependencies across motion sequences. Moreover, the lack of accurate motion segmentation and contextual reasoning further complicates the classification process. These challenges are illustrated in Figure 1, which shows overlapping body postures commonly observed during badminton stroke execution.

**Figure 1: Complex overlapping body posture in badminton strokes**

Goal of Experimentation

This research aims to develop an intelligent simulation model that combines agent-based reasoning and ensemble learning to recognize badminton strokes with high precision. The model simulates how players move and perform actions in a temporal-spatial space, mimicking real game conditions and aiding sports analytics.

Methods

A hybrid approach is applied in this study to improve the recognition of badminton strokes. Spatial features are extracted using 3D skeleton coordinates obtained from pose estimation, with the right hip serving as the anchor point for consistent joint positioning. Temporal dynamics are captured through Fast Dynamic Time Warping (FDTW), which aligns motion sequences to reflect the progression of movement over time. For classification, an ensemble learning strategy is employed by combining Support Vector Machine (SVM), Logistic Regression (LR), and AdaBoost using a weighted soft voting mechanism to boost performance and stability. The entire simulation framework is developed using real video datasets of badminton athletes, with each action segmented into 15 representative frames.

Architecture System

Figure 2 illustrates the system architecture, which begins with pose extraction from RGB video frames using MediaPipe. From the extracted skeletons, spatial features are derived from a key frame, while temporal features are computed using Fast Dynamic Time Warping (FDTW) across 15 frames. These features are then used to train and test a weighted ensemble model combining SVM, Logistic Regression, Random Forest, and AdaBoost classifiers. The final output consists of six classified badminton stroke types.

Results and Discussion

Figure 3 shows an overhead forehand stroke with 3D skeleton pose estimation used to extract spatiotemporal features. The overlaid joints represent key motion points critical for action classification. The bar chart highlights the strong performance of the weighted ensemble model, achieving high scores in accuracy, precision, recall, and F1-score. This demonstrates the model’s reliability in recognizing dynamic badminton strokes using skeleton-based input.

**Figure 3: Visualization of Badminton Stroke Recognition: Pose Estimation and Ensemble Evaluation**

Value Proposition

SimBadAI transforms sports action data into intelligent insights:
– Enables coaches and analysts to monitor performance.
– Enhances training feedback with reliable simulation.
– Supports innovative sports systems for future tournaments.

Applicable for sports science, real-time feedback, augmented coaching, and academic research.