Deploying DeepSeek-R1 on Nebula Block Instance

Deploying DeepSeek-R1 on Nebula Block Instance

This tutorial will guide you through the process of deploying DeepSeek-R1 using an instance on Nebula Block. By following these steps, you will have a fully operational DeepSeek-R1-Distill-Qwen-1.5B model running on a GPU-powered instance, accessible via web interface.

1. Setting Up a Serverless Instance on Nebula Block

Nebula Block provides a serverless environment that allows you to deploy and manage instances without worrying about the underlying infrastructure. Follow these steps to set up your instance:

1.1 Sign Up & Create an Instance

Create an Account: Visit Nebula Block and sign up for a new account if you don’t already have one. New users receive $10 in initial credits with referral code (you could reach us to gain one if you don’t have anyto explore the platform.

Upgrade Account: Deposite $10 to upgrade to Engineer Tier 3 to unlock GPU Instances. For more details, visit Nebula Block Tier Overview.

Deploy an Instance: Navigate to the Instances section in the left panel.

Click on the “Deploy” button to create a new instance.

1.2 Configure the Instance

GPU Selection:
Choose a GPU instance type based on your model’s requirements. For the DeepSeek-R1-Distill-Qwen-1.5B model, we recommend using an RTX 4090 GPU.

Operating System:
Select Ubuntu Server 22.04 LTS R550 CUDA 12.4 with Docker as the operating system. This ensures compatibility with GPU-accelerated workloads.

SSH Public Key:
Add your SSH public key for secure access to the instance. If you don’t have one, you can generate it using tools like ssh-keygen and then use the “+” button to save it

Instance Name:
Assign a name to your instance for easy identification.

Deploy:
Once all fields are filled, click “Deploy”. Ensure your account has sufficient credits to proceed.

2. Accessing the Instance

After deployment, you can view the instance details in the Instances section.

Use the provided Public IP and your SSH key to log into the instance via a terminal:

ssh -i /path/to/your/private_key.pem username@PUBLIC_IP

Here’s what each part means:

  • -i /path/to/your/private_key.pem: Specifies the path to your SSH private key (replace with the actual path to your key).
  • username: Replace with the appropriate username for your instance (e.g., ubuntu for Ubuntu instances or ec2-user for Amazon EC2 instances).
  • PUBLIC_IP: Replace with the Public IP address of your instance.

3. Checking GPU Drivers and Docker Service

Once logged in, verify that the GPU drivers and Docker service are running correctly:

3.1 Check GPU Status

nvidia-smi

If the output displays GPU details, the drivers are correctly installed.

3.2 Check Docker Status

systemctl status docker

Ensure that the Docker service is active and running.

4. Deploying the DeepSeek-R1 Model with vLLM API

We’ll use vLLM (Vectorized Low-Latency Model Serving) to deploy the DeepSeek-R1-Distill-Qwen-1.5B model. This service will automatically pull and load the model.

Run the following Docker command to start the service:

docker run -dit — name vllm — restart=always — gpus all -p 38000:8000 swanchain254/vllm-deepseek

Environment Variables: You can specify additional environment variables for the container:

  • MODEL_NAME: The model to run (default is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
  • HUGGING_FACE_HUB_TOKEN: Your Hugging Face token (required for some models).

Example:

docker run -dit — name vllm — restart=always — gpus all -p 38000:8000 \ -e MODEL_NAME=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \ -e HUGGING_FACE_HUB_TOKEN=your_token_here \ swanchain254/vllm-deepseek

Monitor Logs: To check if the service is running properly, view the logs:

docker logs -f vllm

You should see logs similar to the screenshot below, indicating that the model has been successfully loaded and the API is now ready to accept requests.

5. Setting Up OpenWebUI for Interactive Use

To interact with the model via a web interface, deploy OpenWebUI and connect it to the vLLM API.

5.1 Deploy OpenWebUI

Run the following Docker command to start OpenWebUI:

docker run -d -p 43000:8080 \ -e OPENAI_API_BASE_URL=http://PUBLIC_IP:38000/v1 \ -v /opt/webui:/app/backend/data \ — name open-webui \ — restart always \ ghcr.io/open-webui/open-webui:main

Replace PUBLIC_IP with the IP address of your Nebula Block instance.

Note: If the image is not found locally, it will be automatically pulled from the remote registry.

5.2 Access the Web Interface

Open your browser and navigate to:

http://PUBLIC_IP:43000

You can now interact with the DeepSeek-R1-Distill-Qwen-1.5B model through the OpenWebUI interface.

Conclusion

You have successfully deployed DeepSeek-R1-Distill-Qwen-1.5B on a Nebula Block instance. The model is now accessible via web interface, making it easy to interact with and integrate into your applications. Happy coding!


Follow Us for the latest updates via our official channels:

Read more

Unlocking Advanced NLP with Nebula Block Embeddings: A Technical Deep Dive

Unlocking Advanced NLP with Nebula Block Embeddings: A Technical Deep Dive

Introduction Nebula Block’s Embeddings Endpoint empowers developers to harness state-of-the-art text representation models for tasks like semantic search, RAG (Retrieval Augmented Generation), and content classification. This article explores its architecture, integration workflows, and practical applications, with a focus on bridging the gap between research and production. What Are Embeddings?

By Nebula Block