AI Chat

● Live 📊 Open Grafana
GPT-OSS-20B on NVIDIA A100
40GB VRAM • 12 vCPUs • ~60 tok/s
🚀

GPT-OSS-20B Ready

Running on NVIDIA A100 GPU with 40GB VRAM

What is ML?
Python Code
Transformers
NBA Analysis
Tests Run
0
Avg Latency
-ms
Throughput
-tok/s
Est. Cost
$0.00
Production Token Test Suite

Test model performance at different token output sizes - critical for production capacity planning

Test All Models
GPT-OSS-20B
NVIDIA A100 | $2.93/hr
GPU
Not tested
Gemini 2.5 Flash
Cloud API | $0.075/1M tok
CLOUD
Not tested
Gemini 2.5 Pro
Cloud API | $1.25/1M tok
PRO
Not tested
Custom Test Configuration
Developer API Test
curl -X POST https://vllm.nexgai.com/v1/completions -H "Content-Type: application/json" -d '{"prompt":"Your prompt","model":"gpt-oss-20b","max_tokens":512}'
API Response
Click "Run Test" to see response...
Performance Comparison Results
Time Model Tokens Latency Tok/s Cost Status

Active GPU: NVIDIA A100 - Run Tests Here

🟢 NVIDIA A100 - LIVE
40GB VRAM | GPT-OSS-20B | $2.93/hr
Copy Code to Test
curl -X POST https://vllm.nexgai.com/v1/completions \
                            -H "Content-Type: application/json" \
                            -d '{"prompt":"Explain neural networks","model":"gpt-oss-20b","max_tokens":256}'

Other GPU Options (Not Currently Deployed)

NVIDIA A100 x2 $5.86/hr
• 80GB VRAM (2x40GB)
• ~120-160 tokens/sec
⚠️ Contact admin to deploy
NVIDIA T4 $0.35/hr
• 16GB GDDR6 VRAM
• ~20-30 tokens/sec
⚠️ Contact admin to deploy
NVIDIA H100 $12.50/hr
• 80GB HBM3 VRAM
• ~150-200 tokens/sec
⚠️ Contact admin to deploy
Cost Calculator
Hourly Cost: $2.93
Daily Cost: $70.32
Monthly Cost: $2,109.60
🏀 NBA Analytics - Custom Data Upload
📁
Drop your NBA data file here

Supports CSV, JSON (player stats, game logs)

Quick NBA Queries
Top Scorers Analysis
Player Comparison
Team Performance
Game Prediction
API Endpoints

Text Completion

curl -X POST https://vllm.nexgai.com/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is machine learning?",
    "model": "gpt-oss-20b",
    "max_tokens": 256,
    "temperature": 0.7,
    "persona": "default"
  }'

Chat Completion

curl -X POST https://vllm.nexgai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 256
  }'
Live API Test
Click "Run Test" to see response...
Production Stats (All Time)
Total Requests
-
Total Tokens
-
Avg Latency
-ms
Est. Cost
$-
Model Breakdown
Model Requests Tokens Avg Latency Tokens/sec Est. Cost
Recent API Requests (Live Log) Auto-refreshes every 30s
Time Model Input Output Latency Status
Session Metrics (This Browser Session)
Session Requests
0
Session Tokens
0
Avg Latency
-ms
Avg Throughput
-tok/s