← Back to the client overview

Stringify AI · CheckLLM

How we'll build CheckLLM

A compliant AI layer that sits on top of a customer's Office 365 / Google Workspace — we observe how a team works, surface friction, and deliver custom business-AI on top of it as a SOC 2-ready service.

Lead Manish
Sponsor Gopal
Phase Build · ~55%
Milestone Sell first app · Dec 2026

The idea in one line

Customers already live in Office 365 / Google Workspace. CheckLLM plugs in as a trusted layer, reads their working context (with permission), and ships custom AI workflows on top — packaged, observable, and compliant — rather than another app they have to adopt.

Architecture at a glance

Client users · per tenant INFRASTRUCTURE · Dokploy · Docker · Traefik · Stringify VM Traefik · TLS & routing CheckLLM application · Next.js Frontend · React · Tailwind · Zustand + React Query API routes Auth · NextAuth + Entra/Google AI orchestration Per-tenant config & modules Integration · MS Graph + Google APIs Postgres tenant_id · RLS · pgvector AI + Retrieval Claude + RAG (pgvector) Encrypted daily backups → GCS EXTERNAL SERVICES Microsoft 365 / Google Workspace · identity + data (per tenant) Anthropic · Claude API request flow external API / backup
End-to-end: a tenant's users sign in through their own workspace; requests hit the Next.js app behind Traefik; the AI layer answers using Claude grounded in that tenant's data; everything runs on our Dokploy/Docker VM with per-tenant Postgres (RLS) and encrypted GCS backups.
Workspace (the customer)
Office 365 + Google Workspace — where the customer's data and identity already live.
▲ connects via
Integration layer
Microsoft Graph API + Google Workspace APIs — read context (read-only first), respecting workspace permissions.
CheckLLM app
Next.js (UI + API routes). Sign-in is the same workspace identity — that's the compliance hook.
AI layer
Anthropic Claude (Opus 4.8 / Sonnet) + retrieval (pgvector) — answers grounded in the customer's own documents.
Data layer
Postgres (our own) or Supabase + pgvector — app data, embeddings, audit log. See decision below.
▲ runs on
Infrastructure
Dokploy + Docker + Traefik + Let's Encrypt on our VM — same pipeline we already run.

The stack & why

LayerToolWhy this
App frameworkNext.js + TypeScriptOne framework for the UI and the backend API routes — fewer moving parts.
UITailwind CSS + shadcn/uiFast, consistent, accessible components — looks polished without a design lift.
State / dataZustand + React QueryZustand for client state, React Query for server data & caching — our standard pattern.
AuthNextAuth + Entra ID + GoogleUsers sign in with their own Microsoft / Google Workspace identity — no new account, and it's the basis of the compliance story.
AIAnthropic ClaudeThe reasoning + assisted layer. Opus 4.8 for hard tasks, Sonnet for fast/cheap.
Retrieval (RAG)pgvectorEmbed and search the customer's docs so answers are grounded in their data, not generic.
IntegrationsMS Graph + Google APIsRead O365 / Workspace context (mail, docs, calendar) with scoped, revocable permissions.
DatabasePostgres own or SupabaseDecision below — both are Postgres, so the choice stays reversible.
ORMPrismaType-safe DB access; we already use it, and it keeps us portable across the two DB options.
Runtime / PMBunOur standard across Stringify repos.
DeployDokploy · Docker · TraefikExisting VM pipeline — auto SSL, no new infra to learn.
ComplianceAudit log · RBAC · encryptionBuilt in from day one so SOC 2 is groundwork, not a retrofit.

Database — our own vs Supabase

Both options are Postgres underneath, so the data model and our Prisma code don't change. The real choice is how much we build & run ourselves vs how much we rent ready-made — i.e. one-time effort now against ongoing happiness later.

A · Our own deployed Postgres

Dokploy-managed on our VM — the way we run every other DB.

  • Fastest at runtime — sits next to the app, no network hop
  • Full control & data residency — cleanest SOC 2 story
  • We already operate it: Dokploy, daily GCS backups, pgvector
  • Near-zero marginal cost — runs on a VM we already pay for
  • No lock-in — plain Postgres, fully portable
  • We build auth, storage, APIs, realtime ourselves
  • We own the ops forever: patching, scaling, monitoring, on-call
Best when control, speed & cost come first.

B · Supabase

Managed Postgres with batteries included.

  • Auth + Storage + Realtime + auto REST/GraphQL + Edge Functions + pgvector, out of the box
  • Fastest to a first version — far less to build
  • Managed infra: they patch, scale, back up, keep it up
  • Platform is itself SOC 2 Type II
  • Recurring cost that grows with usage
  • External subprocessor — customer data leaves our infra (residency review)
  • Some lock-in — auth/storage/edge-functions aren't trivially portable
Best when speed to ship & low ops come first.

Side by side

What mattersOur own PostgresSupabase
Setup (one-time)Higher — provision, wire auth/storage/backupsLower — usable in minutes
What you getA database; we build the restDB + auth + storage + realtime + APIs + functions
Runtime speedLowest latency — co-located with the appNetwork hop + pooler unless co-located
Ongoing opsOurs — patch, scale, monitor, on-callTheirs — managed
Cost~Free — on a VM we already pay forFree tier → Pro $25/mo/project → Team $599/mo + usage
Data control / SOC 2Full control, stays on our infraExternal subprocessor; vendor review needed
Lock-inNone — portable PostgresSome — managed features are sticky
ScalingManual — bigger VM, read replicasPlan-based, less effort
Dev velocitySlower first featuresFaster MVP

One-time effort vs long-term happiness

A · Our own Postgres

One-time effort — higher

Stand up auth (NextAuth), file storage (GCS), connection pooling, and our own dashboards/scripts. A few extra weeks of plumbing up front.

Long-term — happier if we have ops bandwidth

No bills that grow, no vendor in the critical path, fastest runtime, clean compliance. The cost is that every patch, scale and 2am page is ours.

B · Supabase

One-time effort — lower

Connect and go — auth, storage, realtime and APIs already exist. We ship the first use case noticeably sooner.

Long-term — happier if we'd rather not run infra

Fewer 2am pages, automatic scaling. The cost is a recurring bill that grows, an external party holding customer data, and some lock-in to unwind later.

Recommendation: Go with our own Postgres as the strategic base. We already run the muscle (Dokploy, GCS backups, pgvector), it's the fastest at runtime, costs almost nothing on top of the VM, and gives the cleanest data-residency story — which matters because SOC 2 and customer workspace data are core to CheckLLM. We trade a few weeks of one-time setup for no recurring bill, no vendor in the data path, and zero lock-in.

The honest hybrid: if speed-to-first-demo is the priority, we could prototype on Supabase and migrate before the SOC 2 audit — but only cleanly if we use it purely as Postgres (our own NextAuth + GCS), since its auth/storage are the parts that don't port. If we lean on those, the migration stops being free. So: own Postgres unless the meeting decides demo speed beats everything.

How we'll build it

01
Foundation
Next.js app, workspace auth (Entra + Google), DB schema, deploy pipeline.
Gate: app deploys + login works
02
Workspace integration
Read O365 / Workspace context via Graph + Google APIs — read-only first.
Gate: real customer context in
03
AI layer
Claude + RAG over workspace docs; ship the first observational use case.
Gate: one use case end-to-end
04
Delivery-ready
Package one app, polish, sign-off (the 2-week push).
Gate: Jul 2026 sign-off
05
SOC 2 / QA
Audit logging, RBAC, controls review.
Gate: Aug 2026 controls pass

Building for many clients

The bet: build the hard plumbing once, spin up each new client mostly from config. First client is slow because we're building the platform; every client after is fast because we're just configuring it. The model is one platform, many tenants — never a forked codebase per client.

CLIENTS ONE PLATFORM ISOLATED DATA Client A Office 365 Client B Google Workspace Client C Office 365 Client D Workspace CheckLLM Platform Auth · Entra + Google O365 / Google connectors AI + RAG (Claude) Per-tenant config & flags SOC 2 controls Postgres + RLS Tenant A Tenant B Tenant C Tenant D
Every client plugs in their own workspace; one shared engine serves all of them; each tenant's data stays walled off by Row-Level Security.

One codebase, per-client config

  • Auth, the O365 / Google connectors, the AI + RAG layer, and the SOC 2 controls are written once
  • A new client = a tenant record + their workspace connection + which use-cases are switched on
  • Per-tenant feature flags & small plugin modules tailor behaviour — no fork, no N codebases to patch
  • SOC 2 covers every tenant at once, because it's one platform

Data isolation — the real decision

Fastest to grow
Shared DB · row-level

Every row tagged tenant_id, enforced by Postgres Row-Level Security so a query can never leak across clients.

Lowest ops · onboard a client in minutes · relies on strict RLS
Middle ground
Schema-per-tenant

Each client gets their own Postgres schema inside one database — stronger separation, shared engine.

Moderate ops · clearer boundaries
Strongest isolation
DB / deploy-per-tenant

Full physical separation per client — best for compliance & data residency.

Highest ops · for big or sensitive clients
Recommendation: Start shared-DB + RLS for speed, and let a big or especially sensitive client graduate to a dedicated DB. Same code path, just a different connection — so isolation becomes a per-client setting, not a rewrite.

Onboarding a new client — the fast loop

01
Connect workspace
Client OAuths their own O365 / Google — tokens scoped per tenant.
02
Ingest their data
Their docs flow into per-tenant RAG — never mixed with another client's.
03
Toggle use-cases
Switch on the workflows they need; flag any custom bits.
04
Ship
Live for that client — days, not months.
The tension to name: shared-DB multi-tenancy is the fastest way to grow, but it puts the strictness of our Row-Level Security between us and a cross-client data leak. Getting RLS right is a hard requirement — and it's exactly what our SOC 2 Confidentiality criterion is there to prove.

Backend model

Same backend codebase for every client — always. What we vary is the deployment and the data boundary, never the code. Different code per client is the agency death-spiral (N repos, N audits) — we don't do it.

Shared auth layer · Entra ID + Google (per-tenant config) One backend codebase POOLED — DEFAULT Shared instance Client A Client B Client C Shared DB · Row-Level Security SILOED — PREMIUM Dedicated instance Client D — big / regulated Dedicated DB
Same code and the same auth layer for everyone. Most clients share one instance; a big or regulated client graduates to a dedicated one — a config + deploy change, not a rewrite.
ModelCodeComputeDataUse when
Pooled defaultSharedShared instanceShared DB + RLSMost clients — fastest, cheapest
Siloed premiumSharedDedicated instanceDedicated DBBig / regulated / data-residency
Auth: one shared auth layer in both rows. Each client's users sign in through their own Office 365 / Google Workspace, but the auth system is one piece of code, configured per tenant. Isolate at the data and deployment layer — never fork at the code layer.

Handling per-client requirements

When one client needs an API or feature the others don't, it becomes a module or a flag on the shared core — never a fork. We go down this ladder, cheapest first.

EFFORT & RARITY most asks Config flag common AI-layer config occasional Plugin module rare Siloed deploy
Rising effort left to right — and falling frequency. The vast majority of "custom" asks resolve at the two cheapest rungs without touching the codebase.
  • Config flagThe difference is "turn X on" or a parameter. Client A's flag is on, everyone else's is off. Zero code divergence.
  • AI-layer configOften a "custom API" is really a new assistant task: a new tool + RAG connector + workflow. That's configuration of the shared engine — no backend code at all. The CheckLLM shortcut.
  • Plugin moduleA genuinely new endpoint or niche integration ships as an optional module in the same codebase, activated only for entitled tenants. Core untouched; same deploy; same audit.
  • Siloed deployToo heavy or proprietary to ship to everyone → their own instance running the same base code + their module. Rare, reserved for big/regulated clients.
The rule that keeps the platform healthy: build the one-off for the first client; generalize it on the second ask. A recurring need graduates from "Client A's module" to a toggle everyone can use — the platform gets richer over time instead of bloating with dead custom code.
Guardrails: entitlements gate visibility (a client's custom API simply isn't exposed to others) · a bespoke module must never degrade the shared path · the default answer to "can you build us X" is "yes, as a module," not "yes, in a branch." This is the open/closed principle — core closed for modification, open for extension.

What SOC 2 requires

SOC 2 is built on the AICPA Trust Services Criteria (2017, revised points of focus 2022). We pick which criteria are in scope — Security is always required — then prove our controls actually operated over a 6–12 month window. That's a Type II report.

Trust Services Criteria — what we'll claim

Security
Required
Confidentiality
In scope
Availability
In scope
Processing Integrity
Later
Privacy
Later

We start with Security + Confidentiality + Availability — Security is mandatory, and the other two matter because we hold customer workspace data and run a delivery service. Processing Integrity and Privacy can be added in a later audit cycle.

The controls we have to build (Common Criteria CC1–CC9)

CC1
Governance & oversight
  • Information security policy set
  • Defined roles & ownership
  • Management review cadence
CC2
Communication
  • Policies published to the team
  • Security training on onboarding
  • Channel to report incidents
CC3
Risk assessment
  • Annual risk assessment
  • Threat model for the O365 / Workspace integration
CC4
Monitoring
  • Continuous control monitoring
  • Compliance tooling (Vanta / Drata / Secureframe)
  • Periodic log review
CC5
Control activities
  • Documented controls + evidence
  • Segregation of duties
CC6
Access & encryption
  • SSO + MFA (Entra / Google)
  • RBAC, least privilege
  • Quarterly access reviews + automated offboarding
  • Encryption: TLS 1.2+ in transit, AES-256 at rest
CC7
System operations
  • Uptime / error / security alerting
  • Written, tested incident-response plan
  • Dependency scanning + annual pen test
CC8
Change management
  • PR review + CI on every change
  • Controlled releases (release-please)
  • Separate dev / prod environments
CC9
Risk mitigation
  • Vendor / subprocessor review (Anthropic, Microsoft, Google, GCS)
  • DR: daily backups + tested restores

Head start: we already run a few of these — an append-only audit trail and daily encrypted backups to GCS on our existing apps. CheckLLM inherits those patterns from day one, so SOC 2 is groundwork, not a retrofit.

The path: choose Type II → set the testing window (6–12 months) → readiness / gap assessment → run controls & collect evidence (automate with Vanta/Drata) → independent CPA firm performs the audit. Budget realistically ~$25k–$80k and 9–18 months end to end, so the SOC 2 work runs in parallel with the build, not after it.