NexgAI

GPT-OSS-20B on NVIDIA A100

40GB VRAM • 12 vCPUs • ~60 tok/s

🚀

GPT-OSS-20B Ready

Running on NVIDIA A100 GPU with 40GB VRAM

What is ML?

Python Code

Transformers

NBA Analysis

Tests Run

0

Avg Latency

-ms

Throughput

-tok/s

Est. Cost

$0.00

Production Token Test Suite

Test model performance at different token output sizes - critical for production capacity planning

Test All Models

GPT-OSS-20B

NVIDIA A100 | $2.93/hr

GPU

Not tested

Gemini 2.5 Flash

Cloud API | $0.075/1M tok

CLOUD

Not tested

Gemini 2.5 Pro

Cloud API | $1.25/1M tok

PRO

Not tested

Custom Test Configuration

Test Type

Max Tokens

Model

Test Prompt

Developer API Test

curl -X POST https://vllm.nexgai.com/v1/completions -H "Content-Type: application/json" -d '{"prompt":"Your prompt","model":"gpt-oss-20b","max_tokens":512}'

import requests; r = requests.post("https://vllm.nexgai.com/v1/completions", json={"prompt":"Your prompt","model":"gpt-oss-20b","max_tokens":512}); print(r.json())

fetch('/v1/completions',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({prompt:'Your prompt',model:'gpt-oss-20b',max_tokens:512})}).then(r=>r.json()).then(console.log)

API Response

Click "Run Test" to see response...

Performance Comparison Results

Time	Model	Tokens	Latency	Tok/s	Cost	Status

Active GPU: NVIDIA A100 - Run Tests Here

🟢 NVIDIA A100 - LIVE

40GB VRAM | GPT-OSS-20B | $2.93/hr

Test Prompt

Settings

Copy Code to Test

curl -X POST https://vllm.nexgai.com/v1/completions \
                            -H "Content-Type: application/json" \
                            -d '{"prompt":"Explain neural networks","model":"gpt-oss-20b","max_tokens":256}'

Other GPU Options (Not Currently Deployed)

NVIDIA A100 x2 $5.86/hr

• 80GB VRAM (2x40GB)

• ~120-160 tokens/sec

⚠️ Contact admin to deploy

NVIDIA T4 $0.35/hr

• 16GB GDDR6 VRAM

• ~20-30 tokens/sec

⚠️ Contact admin to deploy

NVIDIA H100 $12.50/hr

• 80GB HBM3 VRAM

• ~150-200 tokens/sec

⚠️ Contact admin to deploy

Cost Calculator

Requests/hour

Avg tokens/request

Hours/day

Hourly Cost: $2.93

Daily Cost: $70.32

Monthly Cost: $2,109.60

🏀 NBA Analytics - Custom Data Upload

📁

Drop your NBA data file here

Supports CSV, JSON (player stats, game logs)

Quick NBA Queries

Top Scorers Analysis

Player Comparison

Team Performance

Game Prediction

API Endpoints

Text Completion

curl -X POST https://vllm.nexgai.com/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is machine learning?",
    "model": "gpt-oss-20b",
    "max_tokens": 256,
    "temperature": 0.7,
    "persona": "default"
  }'

Chat Completion

curl -X POST https://vllm.nexgai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 256
  }'

POST	/v1/completions	Text completion
POST	/v1/chat/completions	Chat completion
GET	/models	List models
GET	/personas	List personas
GET	/health	Health check
GET	/stats	Usage stats
GET	/api/docs	API documentation
GET	/api/test	Quick test

Live API Test

Prompt

Response

Click "Run Test" to see response...

Grafana Dashboards Open Full Grafana

Login: admin / nexgai2025

📊 Token Metrics

Token usage, costs, throughput

⚡ Model Performance

Request rates, latency, queue depth

🎮 GPU Utilization

KV cache, memory, compute

🔍 Request Tracing

Request breakdown, histograms

📊

Grafana Dashboards

Click the dashboard links above or use the button to open Grafana in a new tab

Open Grafana Dashboard

Production Stats (All Time)

Total Requests

-

Total Tokens

-

Avg Latency

-ms

Est. Cost

$-

Model Breakdown

Model	Requests	Tokens	Avg Latency	Tokens/sec	Est. Cost

Recent API Requests (Live Log) Auto-refreshes every 30s

Time	Model	Input	Output	Latency	Status

Session Metrics (This Browser Session)

Session Requests

0

Session Tokens

0

Avg Latency

-ms

Avg Throughput

-tok/s

AI Chat

GPT-OSS-20B Ready

Active GPU: NVIDIA A100 - Run Tests Here

Other GPU Options (Not Currently Deployed)

Text Completion

Chat Completion

📊 Token Metrics

⚡ Model Performance

🎮 GPU Utilization

🔍 Request Tracing

Grafana Dashboards