Rowhammer is a hardware vulnerability in modern DRAM (Dynamic Random-Access Memory) chips.
For years, this vulnerability was primarily studied on CPUs. However, researchers from the University of Toronto recently demonstrated a successful Rowhammer attack, dubbed GPUHammer, on an NVIDIA A6000 GPU with GDDR6 memory.
The most concerning consequence of a GPUHammer attack is the potential to degrade an AI model's accuracy, with one proof-of-concept showing a model's accuracy dropping from 80% to less than 1% from a single bit-flip.
How to Stay Safe: Mitigation and Prevention
NVIDIA has acknowledged the findings and issued guidance to help customers mitigate the risk.
1. Enable System-Level ECC
NVIDIA strongly recommends enabling System-Level ECC on GPUs with GDDR6 memory to prevent Rowhammer-style attacks.
For Data Center GPUs: Newer GPUs like the Blackwell and Hopper series have on-die ECC enabled by default. However, for other models, you may need to enable it manually.
For Workstation GPUs: This is particularly relevant for professional GPUs like the NVIDIA A6000, where the research was conducted.
You should check your settings to ensure ECC is turned on.
You can check if ECC is enabled using the nvidia-smi
command-line tool.
2. Consider Your Environment
While Rowhammer attacks are difficult to perform in a typical single-user environment, they pose a more significant threat in multi-tenant or cloud computing environments.
If you are operating in such an environment, ensuring that ECC is enabled is not just a recommendation—it's essential for maintaining data integrity and security.
In summary, while the threat of a Rowhammer attack on an NVIDIA GPU is real, the good news is that there are clear steps you can take to mitigate the risk. By enabling ECC and being aware of your computing environment, you can protect your valuable data and maintain the integrity of your system.
No hay comentarios:
Publicar un comentario