Running Deepseek R1 locally on Apple Silicon using Ollama and OpenWebUI - Quick Start Guide

Model Size Considerations

The performance of Deepseek models largely depends on your available RAM:

Deepseek R1 8B: Runs smoothly on 16-18GB RAM machines, good balance of performance and resource usage
Deepseek R1 14B: Operates at slower output speeds but still functional on 16GB RAM; performs significantly better with 32GB RAM
Deepseek R1 32B: Unable to run on lower ram machines, highly recommend at least 32GB or more RAM.
General rule: More RAM provides better performance and allows for larger context windows

Prerequisites

Apple Silicon Mac (M2, M3, M4 series, M1 can work, but it’s pretty slow)
macOS Sequoia recommended
RAM requirements:
- Minimum: 16GB RAM
- Recommended: 32GB+ RAM
Sufficient free storage space (approximately 8GB for the model)
Docker Desktop for Apple Silicon (Download here)
Ollama (Official website)
OpenWebUI installed and configured

Step 1: Installing Ollama

Visit Ollama’s official website and download the Apple Silicon version
Alternative installation via Homebrew:

brew install ollama

Step 2: Installing OpenWebUI

Install Docker Desktop for Apple Silicon:
- Download from Docker’s official website
- Follow the installation guide for Apple Silicon Macs
- Verify installation by running docker --version in Terminal
Pull and run OpenWebUI:

docker pull ghcr.io/open-webui/open-webui:main
docker run -d -p 3000:8080 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

Step 3: Installing Deepseek Models

Choose your model size based on available RAM:

# For 16GB RAM machines (recommended)
ollama run deepseek-r1:8b

# For 32GB RAM machines or if you need more capabilities
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b

Step 4: Configuration

Launch Ollama:
Either open the desktop app

ollama serve

Open OpenWebUI in your browser: http://localhost:3000
Initial Setup:
- On first launch, you’ll be prompted to create an admin account
- Choose a secure password (this is important even for local installations)
- After logging in, you’ll see the main chat interface
Configure Backend Settings:
- Click on the settings icon in the left sidebar
- Navigate to “Backend Settings”
- Select “Ollama” as your backend type
- Set the API endpoint to http://localhost:11434
- Click “Test Connection” to verify
- Save your settings
Model Configuration:
- Go to the “Models” section in settings
- Click “Download New Model”
- Select your preferred Deepseek model based on your RAM:
  - For 16GB RAM: deepseek-coder-8b-instruct
  - For 32GB RAM: deepseek-coder-14b-instruct
- Configure model parameters:
  - Temperature: 0.7 (default, adjust for creativity vs precision)
  - Context Length:
    - 8B model: up to 8192 tokens
    - 14B model: reduce to 4096 tokens on 16GB RAM
  - Top P: 0.9 (recommended for code generation)

Optimizing Performance

Close unnecessary applications
Monitor memory usage using Activity Monitor
Consider using memory swap if needed:

sudo launchctl limit maxfiles 65535 200000

Troubleshooting

Common Issues and Solutions

Memory Pressure:
- Reduce context length
- Close other resource-intensive applications
- Monitor memory usage in Activity Monitor
Slow Response Times:
- Check CPU/GPU usage
- Verify no other intensive processes are running
- Consider reducing model parameters
Connection Issues:
- Verify Ollama is running (ollama serve)
- Check OpenWebUI Docker container status
- Confirm port availability