Conv-DMSA: an efficient imputation model for multivariate time series through diagonal mask self-attentionHao Zhang, Weilong Ding, Qi Yu, Zijian LiuInternational Journal of Web Information Systems, Vol. 21, No. 1, pp.22-36
The proposed model aims to tackle the data quality issues in multivariate time series caused by missing values. It preserves data set integrity by accurately imputing missing data, ensuring reliable analysis outcomes.
The Conv-DMSA model employs a combination of self-attention mechanisms and convolutional networks to handle the complexities of multivariate time series data. The convolutional network is adept at learning features across uneven time intervals through an imputation feature map, while the Diagonal Mask Self-Attention (DMSA) block is specifically designed to capture time dependencies and feature correlations. This dual approach allows the model to effectively address the temporal imbalance, feature correlation and time dependency challenges that are often overlooked in traditional imputation models.
Extensive experiments conducted on two public data sets and a real project data set have demonstrated the adaptability and effectiveness of the Conv-DMSA model for imputing missing data. The model outperforms baseline methods by significantly reducing the Root Mean Square Error (RMSE) metric, showcasing its superior performance. Specifically, Conv-DMSA has been found to reduce RMSE by 37.2% to 63.87% compared to other models, indicating its enhanced accuracy and efficiency in handling missing data in multivariate time series.
The Conv-DMSA model introduces a unique combination of convolutional networks and self-attention mechanisms to the field of missing data imputation. Its innovative use of a diagonal mask within the self-attention block allows for a more nuanced understanding of the data’s temporal and relational aspects. This novel approach not only addresses the existing shortcomings of conventional imputation methods but also sets a new standard for handling missing data in complex, multivariate time series data sets. The model’s superior performance and its capacity to adapt to varying levels of missing data make it a significant contribution to the field.