LSTM-Autoencoder-Based Anomaly Detection for Indoor Air Quality Time-Series Data

Published in : IEEE Sensors Journal (Volume: 23, Issue: 4, February 2023)

Authors : Yuanyuan Wei, Julian Jang-Jaccard, Wen Xu, Fariza Sabrina, Seyit Camtepe, Mikael Boulic

DOI : https://doi.org/10.1109/JSEN.2022.3230361

Summary Contributed by: Kamalesh Tripathy

Maintaining indoor air quality (IAQ) is crucial for human health, productivity, and work efficiency, particularly for children who spend most of their time at school. Higher concentrations of pollutants such as carbon dioxide (CO₂) can lead to adverse health effects, including difficulty concentrating, reduced memory, headaches, and fatigue. Monitoring IAQ faces challenges due to fluctuating data readings and sensor data quality issues. Previous attempts to use AI techniques for anomaly detection in IAQ require multiple correlated data points. Traditional time-series methods like ARIMA and KNN have limitations, including manual feature extraction, high costs, and sensitivity to outliers.

In this context, Long Short-Term Memory (LSTM) can predict future values based on past patterns, enhancing accuracy and decision-making. LSTM, an extension of Recurrent Neural Networks (RNNs), provides long-term memory by retaining previous information for the current neural node, typically through a cell and three gates: input, output, and forget gates, regulating information flow.

This research proposes a hybrid deep learning model that combines LSTM and autoencoder (AE) to detect anomalous data points in IAQ datasets. The model demonstrates improved performance, particularly for complex autocorrelation sequences within large datasets and unpredictable data distributions. AE, an unsupervised neural network, encodes unlabelled data while filtering out noise. It comprises input, output, and hidden layers and performs encoding, decoding, and reconstruction loss operations.

The proposed LSTM-AE model serves as a time-series data analysis tool. Initially, it transforms the input dataset into fixed-size time-sequenced data with a specified window length. Next, multiple LSTM units interact with AE to identify relevant features through encoding. Subsequently, the decoding operation restores the input sequence structure to generate output data. The model then calculates the reconstruction error rate to determine an optimal threshold for anomaly detection across different time sequences. During the training phase, the objective is to minimize reconstruction error and establish an optimal threshold for detection during the test phase. In the testing phase, a sequence of ten data points with ten timesteps is inputted into a trained LSTM encoder, which labels data points as anomalies or normal based on whether the loss value exceeds the threshold.

The study uses a CO₂ dataset obtained from a real-world deployment in multiple schools to investigate the correlation between CO₂levels, weather conditions, and student performance. The model is trained using two datasets, one for the training phase and one for the testing phase. For the performance metrics, classification accuracy, precision, recall, and F1 score (reliability of the machine learning model) are taken into account.

The proposed model can correctly detect 1888 abnormal data points out of 2100 abnormal data samples, with no false positives or negative errors. The model's efficiency decreases the increase in time window length, with the best performance at 95.0% when the window length is 10 and the worst at 10 and 20. Anomalous data points are detected using a threshold of 1.742. The model shows excellent performance on the Dundin CO₂ dataset with an accuracy of 99.50% and a precision of 100%.

Back

IEEE Xplore Version