Summarisation Modes

3 min read

Nutshell AI offers three summarisation modes that control how much of a document is read before generating the summary. Mode is configured per-tenant from the AI Processing Settings page in the Squirrel admin portal.

The three modes

Mode	Scan coverage	Throughput per worker	Best for
Brief	Opening slice only (first ~16,000 characters — typically the introduction and first few pages)	~50 files/min (~3,000/hour) per worker	Triage, executive dashboards, high-volume bulk passes
Standard	Start + middle + end slices (~48,000 characters total)	~40 files/min (~2,400/hour) per worker	General-purpose summaries that balance coverage with throughput
Detailed	Entire document, no omissions	~12 files/min (~720/hour) per worker	Compliance, legal, archival reference where full fidelity is required

Throughput figures are measured against a ~7,500-word, 1 MB Word document. Real numbers vary with file size, file type, and SharePoint API throttling.

Choosing a mode

Brief is enough when the goal is discovery — someone searches, finds a candidate document via its summary, and decides whether to restore it. The opening slice of a document usually contains the executive summary, abstract, or introduction, so Brief mode captures the "what is this about" answer at the highest throughput.
Standard is the sensible default for most tenants. Reading three slices catches the closing recommendations, action items, or conclusions that a Brief scan would miss on longer documents.
Detailed is for content where every clause matters — contracts, policies, technical specifications, records with retention obligations. Detailed reads everything, so a summary produced by Detailed can be treated as an authoritative précis of the original.

Mode is per-tenant, but Nutshell will re-summarise on demand, so it is reasonable to start with Standard and move to Detailed selectively if you find summaries missing important tail content.

Per-worker throughput and scaling

A worker is a GPU-backed processing engine. Nutshell licensing scales by worker — capacity is added by adding workers, and throughput is reported per worker. Scaling is roughly linear: doubling the worker count roughly doubles files-per-minute.

For example, at Standard mode:

1 worker → ~2,400 files / hour
4 workers → ~9,600 files / hour
10 workers → ~24,000 files / hour

Real throughput is bounded by SharePoint API rate limits and Azure Blob read throughput; adding workers beyond that ceiling produces diminishing returns.

Intelligent resource management

Each Nutshell worker monitors its own CPU and GPU load in real time and adjusts how many documents it processes concurrently:

Under heavier load — Nutshell reduces per-worker concurrency to stay stable and avoid thermal throttling.
When spare capacity is available — Nutshell scales concurrency back up automatically.

The effect is consistent throughput across large batches without manual tuning. Operators do not need to size worker concurrency by hand.

Sample output per mode

Side-by-side sample summaries show what each mode actually produces from the same source document. Reading them is the fastest way to decide which mode to configure.