• Home
  • Scientific results
  • Development and prospect of counterattack and defense technology in deep learning - Dr. Tengda

Development and prospect of counterattack and defense technology in deep learning - Dr. Tengda



Since Szegedy proposed Adversarial Sample [1] and Goodfellow proposed GaN [2], Adversarial Attack and related defense algorithms have become an important topic in CV field. This paper investigates and summarizes the most advanced research results in the field of confrontational attack and defense, and comments on the effectiveness of these attack and defense methods.

Researchers have found that there are serious security hidden danger existing deep learning algorithm, benign sample (specific noise can easily make the normal training model to predict output a high degree of confidence of error, but it is difficult to distinguish from the vision, a phenomenon known as against attack, it is considered to be deployed in the production of a big obstacle to deep learning model, thus inspired people to fight against the attack and defense research widely discussed. According to the threat model, the existing confrontational attacks can be divided into white box, grey box and black box attacks. The difference between the three models is the information the attacker knows. In the threat model of a white box attack, it is assumed that the attacker has complete knowledge of his target model, including the model architecture and parameters. Therefore, an attacker can directly make an antagonistic sample on the target model by any means. In the grey box threat model, the information the attacker knows is limited to the structure of the target model and the permission of query access. In the black box threat model, the attacker can only rely on the return results of query access to generate countermeasure samples. Within the framework of these threat models, researchers have developed many attack algorithms for combating sample generation, such as BFGS (Limited Memory Broyden-Fletcher-Goldfarb-Shan-No, L-BFGS), Fast Gradient Sign Method (Fast Gradient Sign Method), FGSM), Ba-SiC iterative attack/projected gradient descent (BIA/PGD), distributed adversarial attack (DAA), Carlini and Wagner (C&W) attacks, Jacobian-based saliency map attack (JACB), respectively. JSMA and DeepFool. This paper investigates and summarizes the most advanced research results in the field of confrontational attack and defense, and comments on the effectiveness of these attack and defense methods.

1. Confrontation offense and defense introduction

Adversarial Sample (Adversarial Sample) refers to the Sample formed after the introduction of imperceptible noise on the normal Sample. This kind of sample can make CNN model output error output with high confidence. As shown in Figure 1, just adding some noise to the original image (snow mountains or puffer fish) that is imperceptible to the naked eye can make the model give completely wrong predictions.


Figure 1. Adversarial samples in a classification problem

The same problem arises with biometric algorithms. Figure 2 shows the antagonistic samples in the face comparison algorithm. When added against the noise, the face comparison score dropped sharply from 100 to less than 60. In other words, the face-comparison algorithm is no longer able to identify the two people.


Fig. 2 Opposed samples in face comparison

2. Counterattack methods

At present, there are two kinds of methods of generating adversarial samples. According to the different forms of attack can be divided into black box attack and white box attack. Black box attack is the attack in the case of neural network with only limited result feedback; And the white box is the algorithm is completely open, including network structure, gradient and other circumstances of the attack. Although we can only carry out black box attack in most scenarios at present, many scholars have found that attacks on similar networks are mobile and white box attack can achieve better results. In addition, targeted and non-targeted attacks can also be classified according to different targets. As the name implies, the two methods perturbate to the target tag/sample or only target the label itself, while non-targeted in actual experimental results tends to have better generalization.


Figure 3. Mainstream counterattack methods

Specifically, IPGD (Iterative Project Gradient Descent) attacks obtain counter samples by maximizing the classification loss function and directly superimposing the projection or direction of the Gradient onto the input image. GaN-based attacks, which use generation models to generate opposing samples. These two methods have their own advantages and disadvantages. Generally speaking, GaN-based attack has a higher upper limit, but IPGD attack is more stable and repeatable. Both types of attacks are shown in Figure 3. The following is a description of some of the attack methods.

2.1 Fast Gradient Symbol Method (FGSM)

Goodfellow et al. first proposed an effective targetless attack method, called the fast gradient symbol method (FGSM), which generates the countermeasure sample under the L∞ norm limit of the benign sample, as shown in Fig. 1. FGSM is a typical one-step attack algorithm, which performs one-step update along the gradient direction (i.e. symbol) of the antagonistic loss function J(θ, x, y) to increase the loss in the steepest direction. Adversarial samples generated by FGSM are expressed as follows:


Where, ε is the magnitude of the disturbance. Targeted FGSM can be easily extended to target attack algorithm by reducing the gradient of J(θ, x, y ') (where y 'represents target category). If the cross entropy is used as an adversarial loss, then this updating process can reduce the cross entropy between the predicted probability vector and the target probability vector. The gradient update of the target attack algorithm can be expressed as:


In addition, adding random perturbation to the benign sample before executing FGSM can improve the performance and diversity of antagonistic samples generated by FGSM.

2.2 Basic Iterative Attacks and Projection Gradient Descent

Kurakin et al. proposed the BIA method, which improves the performance of FGSM by iterating an iterative optimizer for several times. BIA executes FGSM at a small step size and cuts the updated counter samples to the effective range. In this way, there are a total of T iterations, and the gradient update in the k iteration is as follows:

Where, αT = ε. Projective gradient descent (PGD) can be regarded as a generalized form of BIA, and this method has no constraint αT = ε. In order to constrain the antagonistic perturbation, PGD projects the antagonistic sample learned in each iteration into the ε-L ∞ neighborhood of the benign sample, so that the antagonistic perturbation value is less than ε. It is updated as follows:


In the formula, Proj will project the updated antagonic sample to the ε-L ∞ neighborhood and effective range.

2.3 Momentum Iterative attack

Inspired by the Momentum optimizer, Dong et al. proposed to integrate Momentum memory into the Iterative process of BIM and derived a new Iterative algorithm, Momentum Iterative FGSM (MI-FGSM). The method iteratively updates its countersample in the following manner:


Gk type, gradient by gk + 1 + 1 = mu, gk + Δ J (theta, xk 'x, y) / | | Δ x J (theta, xk', y) | | 1 update. The proposed scheme in the literature targets a set of integrated models and attacks an invisible model in a black box/gray box setting. The basic idea is to consider the gradient of multiple models relative to the input, and comprehensively determine a gradient direction. The countermeasure samples generated by this attack method are more likely to transfer to attack other black box/gray box models. The combination of MI-FGSM and the Integrated Attack Scheme won first place in the NIPS 2017 Aimless Attack and Metered Attack Competition (Black Box Settings).

2.4 Carlini and Wagner attacks

Carlini and Wagner proposed a set of optimization-based counterattack C&W that can generate counterattack samples Cw0, Cw2, and CW∞ under the norm limits of L0, L2, and L∞. Similar to L-BFGS, the optimization objective function is expressed as:


Where, δ is against disturbance; D(∙,∙) represents the L0, L2, or L∞ distance measure; F (x +δ) is a custom confrontation loss that satisfies F (x +δ)≤0 if and only if the prediction of DNN is the target. To make sure that X +δ is produced efficiently

Fig. (i.e. X +δ ∈ [0, 1]), a new variable is introduced to replace δ, as shown in Equation (11) : Where, δ is against disturbance; D(∙,∙) represents the L0, L2, or L∞ distance measure; F (x +δ) is a custom confrontation loss that satisfies F (x +δ)≤0 if and only if the prediction of DNN is the target. To make sure that X +δ is produced efficiently

Figure (that is, x +δ ∈ [0, 1]), a new variable is introduced to replace δ, as shown in Equation (11) :


Thus, x +δ =1/2(tanh(k) + 1) is always in [0, 1] during optimization. In addition to achieving a 100% attack success rate on the normally trained DNN models of MNIST, CIFAR10, and ImageNet, the C&W attack also destroyed the defensive distillation models, which prevented L-BFGS and DEEPFIN from finding the hostile samples.

2.5 Generic counterattack

All of the above attacks are carefully designed antagonistic perturbations against benign samples. In other words, adversarial perturbations do not travel between benign samples. So a natural question is: is there a general perturbation that would cheat the network of the most benign samples? In each iteration, an L-BFGS-like optimization problem is solved for the benign samples for which the current perturbation cannot deceive to find the minimum additional perturbation needed to harm these samples. Additional perturbations are added to the current perturbation. Eventually, the perturbation causes the most benign samples to fool the network. Experiments show that this simple substitution algorithm can effectively attack deep neural networks, such as CaffeNet, GoogleNet, VGG and ResNet. To our surprise, the perturbation that can be transmitted in different samples can also be applied to different models. For example, the general perturbation made on VGG can also achieve a deceiving rate of more than 53% on other models.

3 defense against attack methods

Where there is attack, there is defence. Current methods of defense against samples in academic circles can be divided into two categories. One is adversarial training, in which adversarial samples are added in the training process to make the classification model robust to adversarial samples. The other is to train alone "against the sample detector". At present, relevant work has proved that [] confrontational training cannot produce robustness against many different kinds of attacks (for example, the robustness of L1PGD and LINFPGD is theoretically incompatible). At the same time, we have also done some experiments related to confrontation training, and found that even for the same attack, the model after confrontation training can not be generalized to different attack intensity. Although "counter sample detector" has brought additional inference time, its generalization ability of attack type and attack intensity is much better than that of confrontation training. At the same time, [4] proved that using the similar method of counterattack training model, can also obtain a relatively robust and stable model against counterattack.

3.1 Confront training

Adversarial training is an intuitive defense method against adversarial samples, which attempts to improve the robustness of neural networks by using adversarial samples for training. Formally, this is a min-max game, which can be expressed as:


Where, J(θ, x ', y) is the antagonistic loss function; θ is the network weight; X 'is against the input; Y is the tag truth value. D(x, x ') represents some measure of distance between x and x '. The internal problem of maximizing optimization is to find the most effective countermeasure sample, which can be achieved through well-designed countermeasure attacks such as FGSM and PGD. The external minimization optimization problem is a standard training procedure for loss function minimization. The resulting network should be able to resist the adversarial attacks used in the training phase to generate adversarial samples. Recent studies have shown that confrontational training is one of the most effective means of defense against confrontational attacks. This is mainly because the method achieves the highest accuracy on several benchmark data sets. FGSM confrontation training: Goodfellow et al. first proposed the method of training neural network with the confrontation samples generated by benign and FGSM to enhance the robustness of the network. Their antagonistic objective function can be expressed as:


Type, x + ϵ sign (Δ xJ (theta, x, y)) is a benign sample x according to the method of FGSM generated against sample; C is used to balance the accuracy of benign and adversarial samples. Experiments in the literature show that the network becomes somewhat robust for the antagonistic samples generated by the FGSM method. Specifically, the error rate of the sample in confrontation training decreased sharply from 89.4% to 17.9%. Although the method is effective against FGSM attacks, the trained model is still vulnerable to counterattack based on iterative/optimized approach. Therefore, many studies have further explored confrontational training with stronger confrontational aggression (such as BIA/PGD attack).

3.2 a randomized

Many recent defense measures employ randomization to mitigate the effects of adversarial perturbations in the input/feature domain, since DNNs are intuitively always robust to random perturbations. Randomization-based defense approaches attempt to randomize adversarial effects into random effects, but this is not a problem for most DNNs. The randomization-based defense achieves good performance in both black and gray box Settings, but in white box, EOT can break most defenses by considering random processes during the attack.

3.3 denoising

Denoising is a very simple approach in terms of reducing the antagonistic disturbance/effect. Previous work has pointed to two directions for designing such defences, including input noise reduction and feature map noise reduction. The first direction attempts to partially or completely eliminate the adversarial disturbance from the input, and the second direction is to reduce the effect of the adversarial disturbance on the higher function of DNN learning.

3.4 Bayesian model based defense

Liu et al. combined Bayesian Neural Network (BNN) with confrontation training, so as to learn the weight distribution of the optimal model under counterattack. Specifically, the authors assume that the ownership weights in the network are random and train the network using techniques commonly used in BNN theory. Through antagonistic training, this random BNN significantly improved the antagonistic robustness compared to the common antagonistic training of RSE and CIFAR10 and STL10 and ImageNet143. Schott et al. suggested modeling the classification conditional distribution of the input data based on the Bayesian model, and classifying the new samples into the categories with the highest likelihood generated by the corresponding class conditional model. They named the model Analysis by Synthesis Model (ABS). ABS is called MNIST data sets for L0, I2, and L∞. The first robust model of an attack. ABS achieves the most advanced performance in resisting L0 and L2 attacks, but its performance under L∞ attack is slightly worse than that of PGD confrontation training model.

4 discuss

4.1 Realistic factors

Although the current academic circles have put forward a lot of ways to counter the attack. Under certain conditions, most methods can achieve 100% penetration of the model. However, in the physical world, the vast majority of academic methods are unable to break the biometric system. There are two reasons for this. First, the ability to resist attack generalization is poor. For the algorithm with unknown model, the performance drops sharply. Second, and more importantly, in the physical world, biometric algorithm data comes from the acquisition system (the camera), and the countermeasure sample needs to go through the acquisition system (the camera) to enter the algorithm after it is generated. As pointed out in [6], the acquisition process of the camera will greatly weaken the effect of anti-attack and reduce the success rate of attack greatly.

4.2 Differences between research trends of counterattack and defense

The research trend of counterattack mainly includes two directions. The first is to design more effective and powerful attacks to evaluate emerging defense systems, and the second is to implement counter-attacks in the physical world. Kurakin first implemented the counterattack in the physical world by using the expected value of the model gradient relative to the input and adding the random noise caused by environmental factors. Ekholt et al. further considered the mask and manufacturing error, thus realizing the antagonistic perturbation of traffic signs. The recent successful generation of countermeasure targets by CAO and others can be used to deceive lidar based detection systems, which all confirm the existence of physical countermeasure samples. In terms of defense, since most heuristic defenses are unable to defend against adaptive white-box attacks, researchers begin to pay attention to provable defense, which means that no matter what attack mode the attacker adopts, provable defense can guarantee the performance of defense to some extent. But so far, scalability is still a common problem, so the development of defense systems faces more challenges than attacks.

4.3 Major Challenges

(1) Confront the causal relationship behind the sample. Early studies on this issue attributed the emergence of adversarial samples to the model structure and learning methods, and researchers believed that appropriate strategies and network structures would significantly improve the robustness of adversarial samples. Researchers have tried some exploration along this line of thought, especially the research related to the generation of fuzzy gradient, but in fact it may be an unreasonable research direction. Instead, recent studies have found that the emergence of antagonism is more likely the result of high-dimensional data geometry and inadequate training data. Ludwig et al. showed that adversarial tasks require more data than normal ML tasks, and the required data size is scaled. Based on the above two problems, now the academic circle is studying how to migrate the algorithm attack to the physical world, that is, to carry out the physical counterattack. At present, some progress has been made in the physical counterattack of universal object detection and universal object recognition. The research on physical counterattack of biometric system is also being carried out gradually. This year, the Moscow Huawei Research Institute carried out a physical countermeasure attack Advhat [7] on the public face model. The author used the countermeasure algorithm to generate the sticker interference algorithm in the hat area, and achieved the effect of interference recognition as the person. However, the target users were not successfully attacked in the paper, so there is no effective means to attack specific accounts in actual business scenarios. [8] also attempted to attack the body recognition of surveillance cameras. Therefore, in the future, the defense and detection against attacks in the physical world will be an important topic in the field of computer vision.

(2) The existence of a universal robust decision boundary. Since many counterattack methods are defined under different metrics, although PGD counterattack training shows significant resistance to various L norms, attacks, it has been shown in the literature that it is still vulnerable to counterattack by other norms, such as eAD and CW2. The decision boundaries are also different, and their differences increase with the common dimension of the data set (that is, the difference between the dimension of the data manifold and the dimension of the whole data space).

(3) Effective defense against white box attacks. We still don't see a defense that has a good balance between effectiveness and efficiency. In terms of effectiveness, adversarial training showed the best performance, but the computational cost was high. In terms of efficiency, many defense/detection systems based on randomization and denoising are configured in seconds. However, a number of recent papers have shown that these defences are not as effective as they claim. These studies can prove that the defense theory points out a way to realize the antagonistic defense, but its accuracy and effectiveness are far from meeting the practical requirements.


[1] Bruna, Joan, et al. "Intriguing properties of neural networks." (2013).

[2] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.

[3] Tramèr, Florian, and Dan Boneh. "Adversarial Training and Robustness for Multiple Perturbations." arXiv preprint arXiv:1904.13000 (2019).

[4] Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017).

[5] Tramèr, Florian, and Dan Boneh. "Adversarial Training and Robustness for Multiple Perturbations." arXiv preprint arXiv:1904.13000 (2019).

[6] Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. "Adversarial examples in the physical world." arXiv preprint arXiv:1607.02533 (2016).

[7] Komkov, Stepan, and Aleksandr Petiushko. "AdvHat: Real-world adversarial attack on ArcFace Face ID system." arXiv preprint arXiv:1908.08705 (2019).

[8] Thys, Simen, Wiebe Van Ranst, and Toon Goedemé. "Fooling automated surveillance cameras: adversarial patches to attack person detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2019.