Skip to main content
— cerebras (recommended) cerebras is the default provider because it offers:
  • free api access — no credit card required
  • blazing fast inference — 3,000+ tokens per second
  • powerful models — including gpt-oss-120b with 128k context window
1

create a cerebras account

go to cloud.cerebras.ai and sign up for free.
2

generate an api key

once logged in, navigate to the api keys section and create a new key. copy it.
3

paste in layersense

open layersense settings → api tab → paste your key in the “api key” field.
4

select a model

recommended models:
  • gpt-oss-120b — best quality, 128k context, 3,000+ tok/s
  • llama-3.3-70b — balanced performance, 2,100 tok/s
  • qwen-3-32b — fast and efficient, 2,600 tok/s
cerebras free tier has generous rate limits. if you hit them, the plugin automatically retries with delays.
— google gemini (free) google offers a generous free tier for gemini models.
1

get your api key

go to aistudio.google.com and create an api key.
2

configure layersense

settings → api tab → provider: google gemini → paste your key.
3

select a model

  • gemini-3-pro — newest, most intelligent (preview)
  • gemini-2.5-flash — best price-performance, 1M context
  • gemini-2.5-pro — advanced thinking model
— openrouter openrouter aggregates multiple providers and offers some free models.
1

create an account

go to openrouter.ai and sign up.
2

get your api key

find your api key in the dashboard.
3

configure layersense

settings → api tab → provider: openrouter → paste your key.
4

use free models

look for models tagged :free like:
  • mistralai/devstral-2512:free — 123b coding specialist
  • meta-llama/llama-3-8b-instruct:free — balanced all-purpose
— local (lm studio) run models on your own computer for complete privacy and no rate limits.
1

download lm studio

get it from lmstudio.ai — available for windows, mac, and linux.
2

download a model

inside lm studio, search for and download a model. recommended:
  • qwen 2.5 (various sizes)
  • llama 3 8b
  • mistral 7b
3

start the local server

in lm studio, go to the “local server” tab and click start. the default endpoint is http://127.0.0.1:1234/v1/chat/completions.
4

configure layersense

settings → api tab → provider: local (lm studio) → verify the endpoint matches your lm studio server.
local models require a decent gpu for fast inference. cpu-only works but is slower.
— paid providers if you already have api keys from paid providers:
providersetuprecommended models
openaipaste your key, select provider “openai”gpt-5.1 (best for agents), gpt-5-nano (cheapest), gpt-4.1-mini
anthropicpaste your key, select provider “anthropic”claude-sonnet-4-5 (best balance), claude-haiku-4-5 (fastest), claude-opus-4-5 (most capable)
— which provider to choose?
your situationrecommended provider
just want it to work for freecerebras with gpt-oss-120b (3,000+ tok/s)
need maximum privacylocal with lm studio
already have openai/anthropic keysuse your existing provider
want cutting-edge intelligencegoogle gemini with gemini-3-pro
want to try different modelsopenrouter for variety