Stringify AI · CheckLLM

How we'll build CheckLLM

A compliant AI layer that sits on top of a customer's Office 365 / Google Workspace — we observe how a team works, surface friction, and deliver custom business-AI on top of it as a SOC 2-ready service.

Lead Manish
Sponsor Gopal
Phase Build · ~55%
Milestone Sell first app · Dec 2026

The idea in one line

Customers already live in Office 365 / Google Workspace. CheckLLM plugs in as a trusted layer, reads their working context (with permission), and ships custom AI workflows on top — packaged, observable, and compliant — rather than another app they have to adopt.

Architecture at a glance

End-to-end: a tenant's users sign in through their own workspace; requests hit the Next.js app behind Traefik; the AI layer answers using Claude grounded in that tenant's data; everything runs on our Dokploy/Docker VM with per-tenant Postgres (RLS) and encrypted GCS backups.

Workspace (the customer)

Office 365 + Google Workspace — where the customer's data and identity already live.

▲ connects via

Integration layer

Microsoft Graph API + Google Workspace APIs — read context (read-only first), respecting workspace permissions.

▲

CheckLLM app

Next.js (UI + API routes). Sign-in is the same workspace identity — that's the compliance hook.

▲

AI layer

Anthropic Claude (Opus 4.8 / Sonnet) + retrieval (pgvector) — answers grounded in the customer's own documents.

▲

Data layer

Postgres (our own) or Supabase + pgvector — app data, embeddings, audit log. See decision below.

▲ runs on

Infrastructure

Dokploy + Docker + Traefik + Let's Encrypt on our VM — same pipeline we already run.

The stack & why

Layer	Tool	Why this
App framework	Next.js + TypeScript	One framework for the UI and the backend API routes — fewer moving parts.
UI	Tailwind CSS + shadcn/ui	Fast, consistent, accessible components — looks polished without a design lift.
State / data	Zustand + React Query	Zustand for client state, React Query for server data & caching — our standard pattern.
Auth	NextAuth + Entra ID + Google	Users sign in with their own Microsoft / Google Workspace identity — no new account, and it's the basis of the compliance story.
AI	Anthropic Claude	The reasoning + assisted layer. Opus 4.8 for hard tasks, Sonnet for fast/cheap.
Retrieval (RAG)	pgvector	Embed and search the customer's docs so answers are grounded in their data, not generic.
Integrations	MS Graph + Google APIs	Read O365 / Workspace context (mail, docs, calendar) with scoped, revocable permissions.
Database	Postgres own or Supabase	Decision below — both are Postgres, so the choice stays reversible.
ORM	Prisma	Type-safe DB access; we already use it, and it keeps us portable across the two DB options.
Runtime / PM	Bun	Our standard across Stringify repos.
Deploy	Dokploy · Docker · Traefik	Existing VM pipeline — auto SSL, no new infra to learn.
Compliance	Audit log · RBAC · encryption	Built in from day one so SOC 2 is groundwork, not a retrofit.

Database — our own vs Supabase

Both options are Postgres underneath, so the data model and our Prisma code don't change. The real choice is how much we build & run ourselves vs how much we rent ready-made — i.e. one-time effort now against ongoing happiness later.

A · Our own deployed Postgres

Dokploy-managed on our VM — the way we run every other DB.

Fastest at runtime — sits next to the app, no network hop
Full control & data residency — cleanest SOC 2 story
We already operate it: Dokploy, daily GCS backups, pgvector
Near-zero marginal cost — runs on a VM we already pay for
No lock-in — plain Postgres, fully portable
We build auth, storage, APIs, realtime ourselves
We own the ops forever: patching, scaling, monitoring, on-call

Best when control, speed & cost come first.

B · Supabase

Managed Postgres with batteries included.

Auth + Storage + Realtime + auto REST/GraphQL + Edge Functions + pgvector, out of the box
Fastest to a first version — far less to build
Managed infra: they patch, scale, back up, keep it up
Platform is itself SOC 2 Type II
Recurring cost that grows with usage
External subprocessor — customer data leaves our infra (residency review)
Some lock-in — auth/storage/edge-functions aren't trivially portable

Best when speed to ship & low ops come first.

Side by side

What matters	Our own Postgres	Supabase
Setup (one-time)	Higher — provision, wire auth/storage/backups	Lower — usable in minutes
What you get	A database; we build the rest	DB + auth + storage + realtime + APIs + functions
Runtime speed	Lowest latency — co-located with the app	Network hop + pooler unless co-located
Ongoing ops	Ours — patch, scale, monitor, on-call	Theirs — managed
Cost	~Free — on a VM we already pay for	Free tier → Pro $25/mo/project → Team $599/mo + usage
Data control / SOC 2	Full control, stays on our infra	External subprocessor; vendor review needed
Lock-in	None — portable Postgres	Some — managed features are sticky
Scaling	Manual — bigger VM, read replicas	Plan-based, less effort
Dev velocity	Slower first features	Faster MVP

One-time effort vs long-term happiness

A · Our own Postgres

One-time effort — higher

Stand up auth (NextAuth), file storage (GCS), connection pooling, and our own dashboards/scripts. A few extra weeks of plumbing up front.

Long-term — happier if we have ops bandwidth

No bills that grow, no vendor in the critical path, fastest runtime, clean compliance. The cost is that every patch, scale and 2am page is ours.

B · Supabase

One-time effort — lower

Connect and go — auth, storage, realtime and APIs already exist. We ship the first use case noticeably sooner.

Long-term — happier if we'd rather not run infra

Fewer 2am pages, automatic scaling. The cost is a recurring bill that grows, an external party holding customer data, and some lock-in to unwind later.

Recommendation: Go with our own Postgres as the strategic base. We already run the muscle (Dokploy, GCS backups, pgvector), it's the fastest at runtime, costs almost nothing on top of the VM, and gives the cleanest data-residency story — which matters because SOC 2 and customer workspace data are core to CheckLLM. We trade a few weeks of one-time setup for no recurring bill, no vendor in the data path, and zero lock-in.

The honest hybrid: if speed-to-first-demo is the priority, we could prototype on Supabase and migrate before the SOC 2 audit — but only cleanly if we use it purely as Postgres (our own NextAuth + GCS), since its auth/storage are the parts that don't port. If we lean on those, the migration stops being free. So: own Postgres unless the meeting decides demo speed beats everything.

How we'll build it

Foundation

Next.js app, workspace auth (Entra + Google), DB schema, deploy pipeline.

Gate: app deploys + login works

Workspace integration

Read O365 / Workspace context via Graph + Google APIs — read-only first.

Gate: real customer context in

AI layer

Claude + RAG over workspace docs; ship the first observational use case.

Gate: one use case end-to-end

Delivery-ready

Package one app, polish, sign-off (the 2-week push).

Gate: Jul 2026 sign-off

SOC 2 / QA

Audit logging, RBAC, controls review.

Gate: Aug 2026 controls pass

Building for many clients

The bet: build the hard plumbing once, spin up each new client mostly from config. First client is slow because we're building the platform; every client after is fast because we're just configuring it. The model is one platform, many tenants — never a forked codebase per client.

Every client plugs in their own workspace; one shared engine serves all of them; each tenant's data stays walled off by Row-Level Security.

One codebase, per-client config

Auth, the O365 / Google connectors, the AI + RAG layer, and the SOC 2 controls are written once
A new client = a tenant record + their workspace connection + which use-cases are switched on
Per-tenant feature flags & small plugin modules tailor behaviour — no fork, no N codebases to patch
SOC 2 covers every tenant at once, because it's one platform

Data isolation — the real decision

Fastest to grow

Shared DB · row-level

Every row tagged tenant_id, enforced by Postgres Row-Level Security so a query can never leak across clients.

Lowest ops · onboard a client in minutes · relies on strict RLS

Middle ground

Schema-per-tenant

Each client gets their own Postgres schema inside one database — stronger separation, shared engine.

Moderate ops · clearer boundaries

Strongest isolation

DB / deploy-per-tenant

Full physical separation per client — best for compliance & data residency.

Highest ops · for big or sensitive clients

Recommendation: Start shared-DB + RLS for speed, and let a big or especially sensitive client graduate to a dedicated DB. Same code path, just a different connection — so isolation becomes a per-client setting, not a rewrite.

Onboarding a new client — the fast loop

Connect workspace

Client OAuths their own O365 / Google — tokens scoped per tenant.

Ingest their data

Their docs flow into per-tenant RAG — never mixed with another client's.

Toggle use-cases

Switch on the workflows they need; flag any custom bits.

Ship

Live for that client — days, not months.

The tension to name: shared-DB multi-tenancy is the fastest way to grow, but it puts the strictness of our Row-Level Security between us and a cross-client data leak. Getting RLS right is a hard requirement — and it's exactly what our SOC 2 Confidentiality criterion is there to prove.

Backend model

Same backend codebase for every client — always. What we vary is the deployment and the data boundary, never the code. Different code per client is the agency death-spiral (N repos, N audits) — we don't do it.

Same code and the same auth layer for everyone. Most clients share one instance; a big or regulated client graduates to a dedicated one — a config + deploy change, not a rewrite.

Model	Code	Compute	Data	Use when
Pooled default	Shared	Shared instance	Shared DB + RLS	Most clients — fastest, cheapest
Siloed premium	Shared	Dedicated instance	Dedicated DB	Big / regulated / data-residency

Auth: one shared auth layer in both rows. Each client's users sign in through their own Office 365 / Google Workspace, but the auth system is one piece of code, configured per tenant. Isolate at the data and deployment layer — never fork at the code layer.

Handling per-client requirements

When one client needs an API or feature the others don't, it becomes a module or a flag on the shared core — never a fork. We go down this ladder, cheapest first.

Rising effort left to right — and falling frequency. The vast majority of "custom" asks resolve at the two cheapest rungs without touching the codebase.

Config flagThe difference is "turn X on" or a parameter. Client A's flag is on, everyone else's is off. Zero code divergence.
AI-layer configOften a "custom API" is really a new assistant task: a new tool + RAG connector + workflow. That's configuration of the shared engine — no backend code at all. The CheckLLM shortcut.
Plugin moduleA genuinely new endpoint or niche integration ships as an optional module in the same codebase, activated only for entitled tenants. Core untouched; same deploy; same audit.
Siloed deployToo heavy or proprietary to ship to everyone → their own instance running the same base code + their module. Rare, reserved for big/regulated clients.

The rule that keeps the platform healthy: build the one-off for the first client; generalize it on the second ask. A recurring need graduates from "Client A's module" to a toggle everyone can use — the platform gets richer over time instead of bloating with dead custom code.

Guardrails: entitlements gate visibility (a client's custom API simply isn't exposed to others) · a bespoke module must never degrade the shared path · the default answer to "can you build us X" is "yes, as a module," not "yes, in a branch." This is the open/closed principle — core closed for modification, open for extension.

What SOC 2 requires

SOC 2 is built on the AICPA Trust Services Criteria (2017, revised points of focus 2022). We pick which criteria are in scope — Security is always required — then prove our controls actually operated over a 6–12 month window. That's a Type II report.

Trust Services Criteria — what we'll claim

Security

Required

Confidentiality

In scope

Availability

In scope

Processing Integrity

Later

Privacy

Later

We start with Security + Confidentiality + Availability — Security is mandatory, and the other two matter because we hold customer workspace data and run a delivery service. Processing Integrity and Privacy can be added in a later audit cycle.

The controls we have to build (Common Criteria CC1–CC9)

CC1

Governance & oversight

Information security policy set
Defined roles & ownership
Management review cadence

CC2

Communication

Policies published to the team
Security training on onboarding
Channel to report incidents

CC3

Risk assessment

Annual risk assessment
Threat model for the O365 / Workspace integration

CC4

Monitoring

Continuous control monitoring
Compliance tooling (Vanta / Drata / Secureframe)
Periodic log review

CC5

Control activities

Documented controls + evidence
Segregation of duties

CC6

Access & encryption

SSO + MFA (Entra / Google)
RBAC, least privilege
Quarterly access reviews + automated offboarding
Encryption: TLS 1.2+ in transit, AES-256 at rest

CC7

System operations

Uptime / error / security alerting
Written, tested incident-response plan
Dependency scanning + annual pen test

CC8

Change management

PR review + CI on every change
Controlled releases (release-please)
Separate dev / prod environments

CC9

Risk mitigation

Vendor / subprocessor review (Anthropic, Microsoft, Google, GCS)
DR: daily backups + tested restores

Head start: we already run a few of these — an append-only audit trail and daily encrypted backups to GCS on our existing apps. CheckLLM inherits those patterns from day one, so SOC 2 is groundwork, not a retrofit.

The path: choose Type II → set the testing window (6–12 months) → readiness / gap assessment → run controls & collect evidence (automate with Vanta/Drata) → independent CPA firm performs the audit. Budget realistically ~$25k–$80k and 9–18 months end to end, so the SOC 2 work runs in parallel with the build, not after it.