Summarisation Modes
3 min read
Nutshell AI offers three summarisation modes that control how much of a document is read before generating the summary. Mode is configured per-tenant from the AI Processing Settings page in the Squirrel admin portal.
The three modes
| Mode | Scan coverage | Throughput per worker | Best for |
|---|---|---|---|
| Brief | Opening slice only (first ~16,000 characters — typically the introduction and first few pages) | ~50 files/min (~3,000/hour) per worker | Triage, executive dashboards, high-volume bulk passes |
| Standard | Start + middle + end slices (~48,000 characters total) | ~40 files/min (~2,400/hour) per worker | General-purpose summaries that balance coverage with throughput |
| Detailed | Entire document, no omissions | ~12 files/min (~720/hour) per worker | Compliance, legal, archival reference where full fidelity is required |
Throughput figures are measured against a ~7,500-word, 1 MB Word document. Real numbers vary with file size, file type, and SharePoint API throttling.
Choosing a mode
- Brief is enough when the goal is discovery — someone searches, finds a candidate document via its summary, and decides whether to restore it. The opening slice of a document usually contains the executive summary, abstract, or introduction, so Brief mode captures the "what is this about" answer at the highest throughput.
- Standard is the sensible default for most tenants. Reading three slices catches the closing recommendations, action items, or conclusions that a Brief scan would miss on longer documents.
- Detailed is for content where every clause matters — contracts, policies, technical specifications, records with retention obligations. Detailed reads everything, so a summary produced by Detailed can be treated as an authoritative précis of the original.
Mode is per-tenant, but Nutshell will re-summarise on demand, so it is reasonable to start with Standard and move to Detailed selectively if you find summaries missing important tail content.
Per-worker throughput and scaling
A worker is a GPU-backed processing engine. Nutshell licensing scales by worker — capacity is added by adding workers, and throughput is reported per worker. Scaling is roughly linear: doubling the worker count roughly doubles files-per-minute.
For example, at Standard mode:
- 1 worker → ~2,400 files / hour
- 4 workers → ~9,600 files / hour
- 10 workers → ~24,000 files / hour
Real throughput is bounded by SharePoint API rate limits and Azure Blob read throughput; adding workers beyond that ceiling produces diminishing returns.
Intelligent resource management
Each Nutshell worker monitors its own CPU and GPU load in real time and adjusts how many documents it processes concurrently:
- Under heavier load — Nutshell reduces per-worker concurrency to stay stable and avoid thermal throttling.
- When spare capacity is available — Nutshell scales concurrency back up automatically.
The effect is consistent throughput across large batches without manual tuning. Operators do not need to size worker concurrency by hand.
Sample output per mode
Side-by-side sample summaries show what each mode actually produces from the same source document. Reading them is the fastest way to decide which mode to configure.
See also
- Temperature settings — the other lever, controlling wording style.
- Supported file types — what Nutshell can summarise in each mode.
- Sample summaries — actual output from the three modes.
- How Nutshell works — where mode selection fits in the pipeline.
Need help? support@smikar.com.