AI Chat
● Live
📊 Open Grafana
GPT-OSS-20B on NVIDIA A100
40GB VRAM • 12 vCPUs • ~60 tok/s
Tests Run
0
Avg Latency
-ms
Throughput
-tok/s
Est. Cost
$0.00
Production Token Test Suite
Test model performance at different token output sizes - critical for production capacity planning
Test All Models
GPT-OSS-20B
NVIDIA A100 | $2.93/hr
Not tested
Gemini 2.5 Flash
Cloud API | $0.075/1M tok
Not tested
Gemini 2.5 Pro
Cloud API | $1.25/1M tok
Not tested
Custom Test Configuration
Developer API Test
curl -X POST https://vllm.nexgai.com/v1/completions -H "Content-Type: application/json" -d '{"prompt":"Your prompt","model":"gpt-oss-20b","max_tokens":512}'API Response
Click "Run Test" to see response...
Performance Comparison Results
| Time | Model | Tokens | Latency | Tok/s | Cost | Status |
|---|
Active GPU: NVIDIA A100 - Run Tests Here
🟢 NVIDIA A100 - LIVE
40GB VRAM | GPT-OSS-20B | $2.93/hr
Copy Code to Test
curl -X POST https://vllm.nexgai.com/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain neural networks","model":"gpt-oss-20b","max_tokens":256}'Other GPU Options (Not Currently Deployed)
NVIDIA A100 x2
$5.86/hr
• 80GB VRAM (2x40GB)
• ~120-160 tokens/sec
⚠️ Contact admin to deploy
NVIDIA T4
$0.35/hr
• 16GB GDDR6 VRAM
• ~20-30 tokens/sec
⚠️ Contact admin to deploy
NVIDIA H100
$12.50/hr
• 80GB HBM3 VRAM
• ~150-200 tokens/sec
⚠️ Contact admin to deploy
Cost Calculator
Hourly Cost:
$2.93
Daily Cost:
$70.32
Monthly Cost:
$2,109.60
🏀 NBA Analytics - Custom Data Upload
Drop your NBA data file here
Supports CSV, JSON (player stats, game logs)
Quick NBA Queries
Top Scorers Analysis
Player Comparison
Team Performance
Game Prediction
API Endpoints
Text Completion
curl -X POST https://vllm.nexgai.com/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is machine learning?",
"model": "gpt-oss-20b",
"max_tokens": 256,
"temperature": 0.7,
"persona": "default"
}'Chat Completion
curl -X POST https://vllm.nexgai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 256
}'
Live API Test
Click "Run Test" to see response...
Grafana Dashboards
Open Full Grafana
Login: admin / nexgai2025
📊
Grafana Dashboards
Click the dashboard links above or use the button to open Grafana in a new tab
Open Grafana Dashboard
Production Stats (All Time)
Total Requests
-
Total Tokens
-
Avg Latency
-ms
Est. Cost
$-
Model Breakdown
| Model | Requests | Tokens | Avg Latency | Tokens/sec | Est. Cost |
|---|
Recent API Requests (Live Log)
Auto-refreshes every 30s
| Time | Model | Input | Output | Latency | Status |
|---|
Session Metrics (This Browser Session)
Session Requests
0
Session Tokens
0
Avg Latency
-ms
Avg Throughput
-tok/s