Complete Request Examples#
/v1/chat/completions Endpoint Reference#
This guide provides a complete breakdown of the /v1/chat/completions endpoint, covering request parameters and response formats to help you get started with AIone's OpenAI-compatible API.If you plan to use Gemini image generation models or image-specific parameters such as aspect_ratio, image_size, and top_k, please also refer to the Gemini Image Generation guide.1. Request Parameters#
Required Parameters#
model (required)#
"model": "claude-sonnet-4-6"
messages (required)#
An array of conversation messages. Each message contains a role and content:"messages": [
{"role": "system", "content": "You are a professional technical consultant"},
{"role": "user", "content": "Please explain what an API Gateway is"}
]
system: System prompt that defines the AI's behavior and persona
assistant: AI response (used for multi-turn conversations)
Optional Parameters#
temperature (default: 1.0)#
Controls output randomness, range 0-2:0: Deterministic output; ideal for code generation and data extraction
0.7: Balanced creativity and consistency; recommended for general use
1.5+: High creativity; suitable for creative writing
max_tokens#
Maximum number of tokens to generate. If not specified, the model's default value is used.stream (default: false)#
Whether to enable streaming responses. When set to true, the response is delivered as an SSE data stream.We strongly recommend enabling stream: true for interactive use cases. This reduces time-to-first-token from 10+ seconds down to 2-3 seconds. Non-streaming requests must wait for the model to generate the entire response before returning, which can easily time out when the model produces longer outputs.
top_p (default: 1.0)#
Nucleus sampling parameter. Typically, you only need to adjust either temperature or top_p, not both.Function calling tool definitions, allowing the model to invoke external functions:"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
Specifies the response format; supports JSON mode:"response_format": {"type": "json_object"}
2. Complete Request Example (Streaming - Recommended)#
{
"model": "claude-sonnet-4-6",
"messages": [
{"role": "system", "content": "You are a helpful assistant. Please answer concisely."},
{"role": "user", "content": "What is a RESTful API? Explain in 3 sentences."}
],
"max_tokens": 500,
"temperature": 0.7,
"stream": true
}
When stream: true is enabled, the response is delivered in SSE (Server-Sent Events) format. Each chunk is a JSON object:data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"REST"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"ful"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
choices[0].delta.content: The incremental text fragment in this chunk
choices[0].finish_reason: null means generation is still in progress; stop means the model finished normally; length means max_tokens was reached
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1711500000,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A RESTful API is an interface design style based on the HTTP protocol..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 85,
"total_tokens": 127
}
}
| Field | Description |
|---|
id | Unique request identifier |
choices[0].message.content | The AI-generated response |
choices[0].finish_reason | Completion reason: stop (normal), length (reached max_tokens) |
usage.prompt_tokens | Number of input tokens consumed |
usage.completion_tokens | Number of output tokens consumed |
usage.total_tokens | Total tokens (used for billing) |
5. Important Notes#
1.
Use streaming mode: For interactive scenarios (chat, IDE coding), always use stream: true for a significantly better experience and stability
2.
Authentication: Ensure your API Key is valid and authorized to access the selected model
3.
Parameter format: messages is an array; each message must include both role and content
4.
Token billing: Input and output tokens are billed separately at different rates
5.
Compatibility: AIone is fully compatible with the OpenAI SDK -- use the openai library directly
6.
Error handling: Implement exponential backoff for 429 (rate limit) and 5xx (server error) responses
Modified at 2026-04-04 16:02:45