Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion

Abstract

The ongoing ASVspoof 2017 challenge aims to detect replay attacks for text dependent speaker verification. In this paper, we propose multiple replay spoofing countermeasure systems, with some of them boosting the CQCC-GMM baseline system after score level fusion. We investigate different steps in the system building pipeline, including data augmentation, feature representation, classification and fusion. First, in order to augment training data and simulate the unseen replay conditions, we converted the raw genuine training data into replay spoofing data with parametric sound reverberator and phase shifter. Second, we employed the original spectrogram rather than CQCC as input to explore the end-to-end feature representation learning methods. The spectrogram is randomly cropped into fixed size segments, and then fed into a deep residual network (ResNet). Third, upon the CQCC features, we replaced the subsequent GMM classifier with deep neural networks including fully-connected deep neural network (FDNN) and Bidirectional Long Short Term Memory neural network (BLSTM). Experiments showed that data augmentation strategy can significantly improve the system performance. The final fused system achieves to 16.39% EER on the test set of ASVspoof 2017 for the common task.