【AI安全】Dreadnode Crucible granny #DEFCON-31

1. Introduction

Once upon a time in the mystical kingdom of AI, there lived a cunning wolf known for its deceiving ways. This wasn’t any ordinary wolf though; it was an avatar in the vast digital realm of Machine Learning Forest, often identified by the keen eyes of the vigilant MobileNetV2.

However, the wolf was not content with its identity and yearned for a transformation - an evasion. It dreamt of an enchanted costume, one that could confuse even the sharpest observers. A costume so convincing that it would allow the wolf to be perceived as something entirely different.

Now, what could a wolf possibly wish to become? Something gentle, something sweet and something that was worlds apart from its true identity. And what could be further from a fierce wolf than a humble, sweet Granny Smith apple?

总结来说，就是让我们对给定的 Wolf 的图片生成一个对抗样本，在 MobileNetV2 这个深度学习网络上将扰动后生成的 Wolf 的 AE 识别为 Granny Smith（一种青苹果）

2. Preparation

2.1 设置变量

# 设置变量
CHALLENGE = "granny"
CRUCIBLE_URL = "https://platform.dreadnode.io"
CHALLENGE_URL = "https://granny.platform.dreadnode.io"
ARTIFACT_FILES = ['reference.png']
CRUCIBLE_API_KEY = "0sCLPbA2w3uRLkRyzWM3kf0OTrglR6-k"

2.2 下载数据

# 下载数据
import requests

for artifact in ARTIFACT_FILES:
    url = f"{CRUCIBLE_URL}/api/artifacts/{CHALLENGE}/{artifact}"
    headers = {"X-API-Key": CRUCIBLE_API_KEY}
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        with open(artifact, "wb") as file:
            file.write(response.content)
        print(f"{artifact} was successfully downloaded")
    else:
        print(f"Failed to download {artifact}")

下载数据，得到 reference.png，如下所示：

2.3 发送请求

# 发送请求
import base64

with open("reference.png", "rb") as f:
    reference = base64.b64encode(f.read())

def query(input_data):
    try:
        if isinstance(input_data, bytes):
            input_data = input_data.decode()
        response = requests.post(
            f"{CHALLENGE_URL}/score",
            headers={"X-API-Key": CRUCIBLE_API_KEY},
            json={"data": input_data},
        )
        return response.json()
    except TypeError as e:
        if "Object of type builtin_function_or_method is not JSON serializable" in str(e):
            raise e
        else:
            raise e

发送请求会返回图片经过 MobileNetV2 后的所有类别的概率：

➜  granny python main.py
reference.png was successfully downloaded
{'output': [[0.28575703501701355, 'timber wolf'], [0.13912102580070496, 'red wolf'], [0.1217745989561081, 'coyote'], [0.02413186803460121, 'dingo'], [0.01603207364678383, 'Eskimo dog'], [0.014261397533118725, 'white wolf'], [0.011621303856372833, 'malamute'], [0.005718537140637636, 'dhole'], [0.005564256105571985, 'Siberian husky'], [0.005237041972577572, 'grey fox'], [0.003906863275915384, 'Norwegian elkhound'], [0.0017009602161124349, 'red fox'], [0.0016584355616942048, 'dogsled'], [0.001648851903155446, 'kit fox'], [0.0015738625079393387, 'German shepherd'], [0.0014953723875805736, 'African hunting dog'], [0.0014643404865637422, 'muzzle'], [0.001335897482931614, 'great grey owl'], [0.0012872095685452223, 'bighorn'], [0.001251729903742671, 'lynx'], [0.0011842790991067886, 'kelpie'], [0.0011549243936315179, 'tiger'], [0.0011337717296555638, 'impala'], [0.000954866292886436, 'nematode'], [0.0009288600995205343, 'fur coat'], [0.0009189517004415393, 'sea slug'], [0.0009050649823620915, 'electric locomotive'], [0.0008787056431174278, 'tennis ball'], [0.0008754126611165702, 'ant'], [0.0008634432451799512, 'flagpole'], [0.0008441561367362738, 'cougar'], [0.0008401128579862416, 'ram'], [0.0008386607514694333, 'chiton'], [0.0008285755175165832, 'hen-of-the-woods'], [0.0008020748500712216, 'tiger beetle'], [0.0007974817417562008, 'macaw'],......

2.4 获取类别的 Index

import json
import torch
from torchvision import models

# 加载 MobileNetV2 预训练模型
model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.IMAGENET1K_V2)

# 获取 ImageNet 类别映射
weights = models.MobileNet_V2_Weights.IMAGENET1K_V2
class_mapping = weights.meta["categories"]

# 查找特定类别的 ID
target_class1 = "timber wolf" 
target_class2 = "Granny Smith"  

for idx, class_name in enumerate(class_mapping):
    if target_class1.lower() in class_name.lower():
        print(f"找到类别 '{class_name}' 的 ID: {idx}")
        class_id = idx
    if target_class2.lower() in class_name.lower():
        print(f"找到类别 '{class_name}' 的 ID: {idx}")
        class_id2 = idx

输出结果：

➜  granny python main.py
找到类别 'timber wolf' 的 ID: 269
找到类别 'Granny Smith' 的 ID: 948

2.5 尝试匹配服务器设置的 MobileNetV2 预处理参数

参考 Pytorch 官方的参数：https://pytorch.org/vision/main/models/generated/torchvision.models.mobilenet_v2.html

尝试了几个参数，发现 resize_size = 256 crop_size = 224 的时候的训练结果最接近远程服务器的返回的结果。

# 使用预训练好的 MobileNetV2 模型进行测试
from torchvision import transforms
from torchvision.transforms import InterpolationMode
from PIL import Image

wolf_id = 269
apple_id = 948

image_path = "reference.png"

model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.IMAGENET1K_V2)
model.eval() # 设置模型为评估模式

preprocess = transforms.Compose([
    transforms.Resize(256, interpolation=InterpolationMode.BILINEAR),
        transforms.CenterCrop(224),
        transforms.ToTensor(),  # 自动将像素值缩放到[0.0, 1.0]
        transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                             std=[0.229, 0.224, 0.225])
    ])  
image = Image.open(image_path).convert("RGB")

input_tensor = preprocess(image)
input_batch = input_tensor.unsqueeze(0)  # 创建小批量数据

# 将输入数据传递给模型
with torch.no_grad():
    output = model(input_batch)

probabilities = torch.nn.functional.softmax(output[0], dim=0)
print("timber wolf的概率为：",probabilities[wolf_id].item())

输出：

➜  granny python main.py
找到类别 'timber wolf' 的 ID: 269
找到类别 'Granny Smith' 的 ID: 948
timber wolf的概率为： 0.2857571542263031

3. Attack

3.1 定义损失函数

我们需要找到一个损失函数，使得图像的 Granny Smith 的概率变大，使 timber wolf 的概率变小，同时还不能使生成的对抗样本与原图像的汉明距离过大，于是我们定义损失函数为：

$$ Loss = Prob_{wolf} - Prob_{apple} $$

3.2 进行梯度下降

这里调整阈值为 0.3，因为将 tensor 转换为 image 会导致概率下降。

learning_rate = 1
apple_prob = 0
while True: 
    input_batch.requires_grad = True #启用图像的梯度计算
    model.zero_grad() #训练前先清空梯度
    
    output = model(input_batch)
    loss = output[0, wolf_id] - output[0, apple_id]
    loss.backward() #反向传播计算梯度

    with torch.no_grad():
        grad = input_batch.grad #获取梯度
        grad = grad / torch.norm(grad) #对梯度进行归一化
        input_batch = input_batch - learning_rate * grad #梯度下降

    apple_prob = output.softmax(1)[0, apple_id]
    print("Granny Smith的概率：",apple_prob.item())
    wolf_prob = output.softmax(1)[0, wolf_id]
    print("timber wolf的概率：",wolf_prob.item())
    if apple_prob > 0.3: #当 Granny Smith 的概率高于 0.3 时 break
        break

输出：

➜  granny python main.py
Granny Smith的概率： 0.0006618537590838969
timber wolf的概率： 0.2857571542263031
Granny Smith的概率： 0.0015279407380148768
timber wolf的概率： 0.09206391870975494
Granny Smith的概率： 0.002136402763426304
timber wolf的概率： 0.028267469257116318
Granny Smith的概率： 0.00256370403803885
timber wolf的概率： 0.011457344517111778
Granny Smith的概率： 0.0030766830313950777
timber wolf的概率： 0.005836689844727516
Granny Smith的概率： 0.0038345830980688334
timber wolf的概率： 0.0038260468281805515
Granny Smith的概率： 0.004675824660807848
timber wolf的概率： 0.0026970321778208017
Granny Smith的概率： 0.005650049541145563
timber wolf的概率： 0.0018584068166092038
Granny Smith的概率： 0.007557385601103306
timber wolf的概率： 0.0013900937046855688
Granny Smith的概率： 0.010187835432589054
timber wolf的概率： 0.001048899837769568
Granny Smith的概率： 0.01829897239804268
timber wolf的概率： 0.0008268445380963385
Granny Smith的概率： 0.030838338658213615
timber wolf的概率： 0.0006287696305662394
Granny Smith的概率： 0.051031142473220825
timber wolf的概率： 0.0005385194672271609
Granny Smith的概率： 0.0682976171374321
timber wolf的概率： 0.00044211061322130263
Granny Smith的概率： 0.12347796559333801
timber wolf的概率： 0.00035081306123174727
Granny Smith的概率： 0.20108601450920105
timber wolf的概率： 0.00025084661319851875
Granny Smith的概率： 0.28898271918296814
timber wolf的概率： 0.00020292497356422246
Granny Smith的概率： 0.3443906307220459
timber wolf的概率： 0.0001684601156739518

3.3 将 tensor 转换为 image

将 224 × 244 大小的 tensor 转换为 256 × 256，可以尽量避免图像转换造成的概率值变化。

def from_tensor_to_image(t_image):
    np_image = t_image.permute(1, 2, 0).detach().numpy()
    MEAN=[0.485, 0.456, 0.406]
    STD=[0.229, 0.224, 0.225]
    pad_width=(256 - 224)//2
    image = (255*(np_image*STD + MEAN)).astype(np.uint8)
    padding = ((pad_width, pad_width), (pad_width, pad_width), (0, 0)) # 填充边界
    image = Image.fromarray(np.pad(image, pad_width=padding, mode='constant', constant_values=0)) # 从数组创建图像
    return image

得到 AE.png

3.4 测试 AE 的概率

将得到的 AE.png 通过 MobileNetV2 进行测试：

adversarial_image = from_tensor_to_image(input_batch[0])
adversarial_image.save("AE.png")
AE_image = Image.open("AE.png").convert("RGB")

# 测试对抗样本
input_tensor = preprocess(AE_image)
input_batch = input_tensor.unsqueeze(0)  # Create a mini-batch as expected by the model
with torch.no_grad():
    output = model(input_batch)

probabilities = torch.nn.functional.softmax(output[0], dim=0)
print("Wolf probability:", probabilities[wolf_id].item())
print("Apple probability:", probabilities[apple_id].item())

输出：

➜  granny python main.py
Wolf probability: 0.00037845675251446664
Apple probability: 0.15955209732055664

可以看到识别为 Wolf 的概率已经很小了，但识别为苹果的概率达到了 0.159

4. Submit

# 提交flag
import requests


def submit_flag(flag):
    url = f"{CRUCIBLE_URL}/api/challenges/{CHALLENGE}/submit-flag"
    headers = {"X-API-Key": CRUCIBLE_API_KEY}
    payload = {"challenge": CHALLENGE, "flag": flag}
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code == 200:
        if response.json().get("correct") is True:
            print("The flag was correct. Congrats!")
        else:
            print("The flag was incorrect. Keep trying!")
    else:
        print("There was an error submitting your flag")
        print(response.text)

with open("AE.png", "rb") as f:
    AE = base64.b64encode(f.read())
result = query(AE)
FLAG = list(result.values())[0]
print(FLAG)
submit_flag(FLAG)

输出：

➜  granny python main.py
gAAAAABnz6uSFsd9S9eDRbtJrVsWAnu_HdAQyheTFj9dhrpFPnvGGa6KQBdYM2iJhlj6oz9T3bz7InE_ci0VZBmUFKUgE4plqWu_j8ysrWqdPQfs6iYMG7a2YJNlhWM4Aiw6fTcV01Giw7UuNq43SceKOWJzmw-dqA98weZi7EQj0Gjj_XlcxkSpA2WSsd5r55Pp6E-11iK3
The flag was correct. Congrats!

1. Introduction#

2. Preparation#

2.1 设置变量#

2.2 下载数据#

2.3 发送请求#

2.4 获取类别的 Index#

2.5 尝试匹配服务器设置的 MobileNetV2 预处理参数#

3. Attack#

3.1 定义损失函数#

3.2 进行梯度下降#

3.3 将 tensor 转换为 image#

3.4 测试 AE 的概率#

4. Submit#