Model Size Considerations
The performance of Deepseek models largely depends on your available RAM:
- Deepseek R1 8B: Runs smoothly on 16-18GB RAM machines, good balance of performance and resource usage
- Deepseek R1 14B: Operates at slower output speeds but still functional on 16GB RAM; performs significantly better with 32GB RAM
- Deepseek R1 32B: Unable to run on lower ram machines, highly recommend at least 32GB or more RAM.
- General rule: More RAM provides better performance and allows for larger context windows
Prerequisites
- Apple Silicon Mac (M2, M3, M4 series, M1 can work, but it’s pretty slow)
- macOS Sequoia recommended
- RAM requirements:
- Minimum: 16GB RAM
- Recommended: 32GB+ RAM
- Sufficient free storage space (approximately 8GB for the model)
- Docker Desktop for Apple Silicon (Download here)
- Ollama (Official website)
- OpenWebUI installed and configured
Step 1: Installing Ollama
- Visit Ollama’s official website and download the Apple Silicon version
- Alternative installation via Homebrew:
brew install ollama
Step 2: Installing OpenWebUI
- Install Docker Desktop for Apple Silicon:
- Download from Docker’s official website
- Follow the installation guide for Apple Silicon Macs
- Verify installation by running
docker --version
in Terminal
- Pull and run OpenWebUI:
docker pull ghcr.io/open-webui/open-webui:main
docker run -d -p 3000:8080 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main
Step 3: Installing Deepseek Models
- Choose your model size based on available RAM:
# For 16GB RAM machines (recommended)
ollama run deepseek-r1:8b
# For 32GB RAM machines or if you need more capabilities
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b
Step 4: Configuration
- Launch Ollama:
Either open the desktop app
ollama serve
- Open OpenWebUI in your browser:
http://localhost:3000
- Initial Setup:
- On first launch, you’ll be prompted to create an admin account
- Choose a secure password (this is important even for local installations)
- After logging in, you’ll see the main chat interface
- Configure Backend Settings:
- Click on the settings icon in the left sidebar
- Navigate to “Backend Settings”
- Select “Ollama” as your backend type
- Set the API endpoint to
http://localhost:11434
- Click “Test Connection” to verify
- Save your settings
- Model Configuration:
- Go to the “Models” section in settings
- Click “Download New Model”
- Select your preferred Deepseek model based on your RAM:
- For 16GB RAM: deepseek-coder-8b-instruct
- For 32GB RAM: deepseek-coder-14b-instruct
- Configure model parameters:
- Temperature: 0.7 (default, adjust for creativity vs precision)
- Context Length:
- 8B model: up to 8192 tokens
- 14B model: reduce to 4096 tokens on 16GB RAM
- Top P: 0.9 (recommended for code generation)
Optimizing Performance
- Close unnecessary applications
- Monitor memory usage using Activity Monitor
- Consider using memory swap if needed:
sudo launchctl limit maxfiles 65535 200000
Troubleshooting
Common Issues and Solutions
- Memory Pressure:
- Reduce context length
- Close other resource-intensive applications
- Monitor memory usage in Activity Monitor
- Slow Response Times:
- Check CPU/GPU usage
- Verify no other intensive processes are running
- Consider reducing model parameters
- Connection Issues:
- Verify Ollama is running (
ollama serve
) - Check OpenWebUI Docker container status
- Confirm port availability
- Verify Ollama is running (
Additional Resources and Links
- Ollama:
- Download page: https://ollama.ai/download
- Other models you can add: https://ollama.com/library
- GitHub repository: https://github.com/ollama/ollama
- Docker:
- Apple Silicon installation: https://docs.docker.com/desktop/install/mac-install/
- Docker Desktop documentation: https://docs.docker.com/desktop/