Docker

warning

🚧 Cortex.cpp is currently in development. The documentation describes the intended functionality, which may not yet be fully implemented.

Setting Up Cortex with Docker

This guide walks you through the setup and running of Cortex using Docker.

Prerequisites

Docker or Docker Desktop
nvidia-container-toolkit (for GPU support)

Setup Instructions

Build Cortex Docker Image from source or Pull from Docker Hub

Pull Cortex Docker Image from Docker Hub


# Pull the latest image
docker pull menloltd/cortex:latest
# Pull a specific version
docker pull menloltd/cortex:nightly-1.0.1-224

Build and Run Cortex Docker Container from Dockerfile

Clone the Cortex Repository

git clone https://github.com/janhq/cortex.cpp.git cd cortex.cpp git submodule update --init
Build the Docker Image

Latest cortex.llamacpp
Specify cortex.llamacpp version


docker build -t cortex --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -f docker/Dockerfile .


docker build --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -t cortex -f docker/Dockerfile .

Run Cortex Docker Container

Run the Docker Container

Create a Docker volume to store models and data:

docker volume create cortex_data
- GPU mode
- CPU mode
# requires nvidia-container-toolkit docker run --gpus all -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex

Check Logs (Optional)

docker logs cortex
Access the Cortex Documentation API
- Open http://localhost:39281 in your browser.
Access the Container and Try Cortex CLI

docker exec -it cortex bash cortex --help

Usage

With Docker running, you can use the following commands to interact with Cortex. Ensure the container is running and curl is installed on your machine.

1. List Available Engines


curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json"

Example Response
{ "data": [ { "description": "This extension enables chat completion API calls using the Onnx engine", "format": "ONNX", "name": "onnxruntime", "status": "Incompatible" }, { "description": "This extension enables chat completion API calls using the LlamaCPP engine", "format": "GGUF", "name": "llama-cpp", "status": "Ready", "variant": "linux-amd64-avx2", "version": "0.1.37" } ], "object": "list", "result": "OK" }

2. Pull Models from Hugging Face

Open a terminal and run websocat ws://localhost:39281/events to capture download events, follow this instruction to install websocat.
In another terminal, pull models using the commands below.
- Pull model from Cortex's Hugging Face hub
- Pull model directly from a URL
# requires nvidia-container-toolkit curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'
After pull models successfully, run command below to list models.

curl --request GET --url http://localhost:39281/v1/models

3. Start a Model and Send an Inference Request

Start the model:

curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'
Send an inference request:

curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{ "frequency_penalty": 0.2, "max_tokens": 4096, "messages": [{"content": "Tell me a joke", "role": "user"}], "model": "tinyllama:gguf", "presence_penalty": 0.6, "stop": ["End"], "stream": true, "temperature": 0.8, "top_p": 0.95 }'

4. Stop a Model

To stop a running model, use:
curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'

Setting Up Cortex with Docker​

Prerequisites​

Setup Instructions​

Build Cortex Docker Image from source or Pull from Docker Hub​

Pull Cortex Docker Image from Docker Hub​

Build and Run Cortex Docker Container from Dockerfile​

Run Cortex Docker Container​

Usage​

1. List Available Engines​

2. Pull Models from Hugging Face​

3. Start a Model and Send an Inference Request​

4. Stop a Model​