Enhanced CNN-LSTM Feature Extraction and Ensemble Learning for Anomaly Detection in Photovoltaic Data
Abstract
This study proposes an anomaly detection framework that combines CNN–LSTM feature extraction with a boosting-based ensemble strategy to improve the reliability of photovoltaic (PV) system monitoring. Real multi-source PV operational data are first preprocessed using the ISODATA clustering algorithm, which automatically adjusts the number of clusters and reduces redundancy. Principal component analysis (PCA) is then applied to lower data dimensionality while retaining key variability. A hybrid CNN-LSTM network is developed, where CNNs extract spatial features from heterogeneous PV measurements and LSTMs capture temporal dependencies in power sequences. Based on the learned representations, an ensemble model integrates the outputs of Gaussian Mixture Models (GMM), Isolation Forest (IF), and Interquartile Range (IQR) through a boosting-inspired weighting mechanism to enhance robustness under complex operating conditions. Experiments conducted on real PV datasets show that the proposed method achieves nearly 97% anomaly detection accuracy, with an average F1-score of 0.89 ± 0.03 and a recall rate of 0.91 ± 0.02. Compared with single-model baselines, the framework provides more stable performance and maintains a false positive rate below 2.1%, demonstrating its practical value for real-world PV anomaly detection.
