Complete Request Examples

/v1/chat/completions Endpoint Reference

This guide provides a complete breakdown of the /v1/chat/completions endpoint, covering request parameters and response formats to help you get started with AIone's OpenAI-compatible API.

If you plan to use Gemini image generation models or image-specific parameters such as aspect_ratio, image_size, and top_k, please also refer to the Gemini Image Generation guide.

1. Request Parameters

Required Parameters

model (required)

The model ID to use. See Models & Pricing for the full list.

"model": "claude-sonnet-4-6"

messages (required)

An array of conversation messages. Each message contains a role and content:

"messages": [
  {"role": "system", "content": "You are a professional technical consultant"},
  {"role": "user", "content": "Please explain what an API Gateway is"}
]

Accepted role values:

system: System prompt that defines the AI's behavior and persona

user: User message

assistant: AI response (used for multi-turn conversations)

Optional Parameters

temperature (default: 1.0)

Controls output randomness, range 0-2:

0: Deterministic output; ideal for code generation and data extraction

0.7: Balanced creativity and consistency; recommended for general use

1.5+: High creativity; suitable for creative writing

max_tokens

Maximum number of tokens to generate. If not specified, the model's default value is used.

stream (default: false)

Whether to enable streaming responses. When set to true, the response is delivered as an SSE data stream.

We strongly recommend enabling stream: true for interactive use cases. This reduces time-to-first-token from 10+ seconds down to 2-3 seconds. Non-streaming requests must wait for the model to generate the entire response before returning, which can easily time out when the model produces longer outputs.

top_p (default: 1.0)

Nucleus sampling parameter. Typically, you only need to adjust either temperature or top_p, not both.

tools

Function calling tool definitions, allowing the model to invoke external functions:

"tools": [
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get the weather for a specified city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
      }
    }
  }
]

response_format

Specifies the response format; supports JSON mode:

"response_format": {"type": "json_object"}

2. Complete Request Example (Streaming - Recommended)

{
  "model": "claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant. Please answer concisely."},
    {"role": "user", "content": "What is a RESTful API? Explain in 3 sentences."}
  ],
  "max_tokens": 500,
  "temperature": 0.7,
  "stream": true
}

3. Streaming Response Format

When stream: true is enabled, the response is delivered in SSE (Server-Sent Events) format. Each chunk is a JSON object:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"REST"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"ful"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Key fields:

choices[0].delta.content: The incremental text fragment in this chunk

choices[0].finish_reason: null means generation is still in progress; stop means the model finished normally; length means max_tokens was reached

4. Non-Streaming Response Format

{
  "id": "chatcmpl-abc123def456",
  "object": "chat.completion",
  "created": 1711500000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A RESTful API is an interface design style based on the HTTP protocol..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 85,
    "total_tokens": 127
  }
}

Field descriptions:

Field	Description
`id`	Unique request identifier
`choices[0].message.content`	The AI-generated response
`choices[0].finish_reason`	Completion reason: `stop` (normal), `length` (reached max_tokens)
`usage.prompt_tokens`	Number of input tokens consumed
`usage.completion_tokens`	Number of output tokens consumed
`usage.total_tokens`	Total tokens (used for billing)

5. Important Notes

Use streaming mode: For interactive scenarios (chat, IDE coding), always use stream: true for a significantly better experience and stability

Authentication: Ensure your API Key is valid and authorized to access the selected model

Parameter format: messages is an array; each message must include both role and content

Token billing: Input and output tokens are billed separately at different rates

Compatibility: AIone is fully compatible with the OpenAI SDK -- use the openai library directly

Error handling: Implement exponential backoff for 429 (rate limit) and 5xx (server error) responses

07 - Complete Examples

Complete Request Examples#

/v1/chat/completions Endpoint Reference#

1. Request Parameters#

Required Parameters#

model (required)#

messages (required)#

Optional Parameters#

temperature (default: 1.0)#

max_tokens#

stream (default: false)#

top_p (default: 1.0)#

tools#

response_format#

2. Complete Request Example (Streaming - Recommended)#

3. Streaming Response Format#

4. Non-Streaming Response Format#

5. Important Notes#