How to Use GPU
Ollama supports GPU acceleration for model inference. Here's how to configure it on Windows.
NVIDIA
Supported GPUs
- NVIDIA GeForce RTX series (20/30/40/50 series and above)
- NVIDIA GeForce GTX 16 series and above
- NVIDIA Tesla series
- 6GB+ VRAM recommended
- CUDA Capability 7.0 or higher
Install CUDA
- Visit NVIDIA website to download CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
- Select Windows and your version
- Download and install CUDA Toolkit (v11.7 or later recommended)
- Verify installation by running:
nvidia-smi - Restart Ollama to enable GPU acceleration
AMD
Supported GPUs
Officially supported:
- AMD Radeon RX 9000 series
- AMD Radeon RX 7000 series
- AMD Radeon RX 6000 series
- AMD Instinct series
- 6GB+ VRAM recommended
Install HIP
- Download and install the latest AMD drivers
- Install HIP SDK (https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- Restart Ollama to enable GPU acceleration
Unsupported AMD GPUs
Some AMD GPUs (500 series, RDNA 5000 series, 680M, etc.) lack official ROCm support. Use the following workaround:
Ollama-for-AMD
- Visit https://github.com/likelovewant/ollama-for-amd
- Download pre-compiled binaries or build from source
- Download pre-compiled rocblas and library files
- Replace rocblas.dll and library files accordingly
- Restart Ollama
Easier Method
- Use Ollama-For-AMD-Installer
- Select your GPU model and click "Check latest version"
- The tool will automatically complete all configuration
Important Notes
- If GPU still can't be used (common on dual-GPU laptops), try setting environment variables to force Ollama to use a specific GPU
- Set your power plan to "High Performance" mode
- Keep your GPU drivers up to date
- Monitor VRAM usage to avoid overflow
- Close other GPU-intensive applications when using large models