Auth, Rate Limits, Audit Logs: Production MCP

TL;DR

Productionizing a LinkedIn MCP server is three pillars in one stack: authentication, rate limiting, audit logs. Each pillar carries a one-line rule that holds even when traffic scales past the demo phase.

Auth - bearer tokens with explicit scopes and one-click revocation. Long-lived tokens, never raw credentials, never inherited from a session cookie.
Rate limits - throttle per token, not per user or per IP. A token is the only stable principal in an MCP loop.
Audit logs - log every write before the network call leaves the box. Forensics matter more than analytics; the log is your TOS defense.

Anthropic’s MCP specification, formalized in late 2024 and updated through the 2025-06-18 protocol revision, defines the wire format. It deliberately leaves auth, throttling, and observability to the server. Skip those three pillars and your demo MCP becomes a Friday-night incident the first time an agent loops.

Why “production-grade MCP” isn’t just MCP that compiles

A working MCP server and a production MCP server differ by three orders of magnitude in failure surface. The Anthropic specification covers JSON-RPC transport, tool discovery, and resource exposure. It does not cover what happens when a misconfigured Claude Desktop config loops 4,000 tool calls in 90 seconds, or when a leaked token surfaces in a public GitHub gist three months after issuance, or when LinkedIn’s Trust and Safety team asks you to prove a specific message wasn’t sent by your platform.

Production hardening lives in the layer above the spec. The same three concerns show up in every payment API, every SSH bastion, every webhook receiver shipped in the last decade: who is calling, how fast, and what did they do. MCP is no exception. The protocol matures the surface area; the operational discipline stays the same as it has been since OWASP API Security Top 10 first listed broken auth and unrestricted resource consumption as the top two risks in 2023.

The point is not that MCP is risky. The point is that an MCP server exposed at a public URL with a bearer-token contract is now your production API. Treat it that way from day one.

Citation capsule. Anthropic’s MCP specification defines transport and discovery but leaves authentication, rate limiting, and observability to the server implementer. Production hardening therefore reuses well-known API patterns: bearer tokens with revocation, per-principal throttling, and write-side audit trails. The OWASP API Security Top 10 lists broken authentication and unrestricted resource consumption as the top two risks (OWASP API Security Top 10, 2023).

Pillar 1, authentication: how should an MCP server prove who is calling?

The cleanest auth surface for an MCP server is a long-lived bearer token, scoped per principal, revocable in one click, transmitted in the Authorization: Bearer header per RFC 6750. That single sentence covers 90% of the design. The remaining 10% is the operational discipline around token issuance, scope, rotation, and revocation, and it’s where every team gets surprised.

Why Laravel Sanctum over OAuth2 for MCP

Laravel Sanctum gives you bearer-token API authentication without an OAuth2 server. That tradeoff matters for MCP. An OAuth2 authorization-code flow assumes a human in a browser approving a third-party app. An MCP client, by contrast, is usually a desktop binary that the user already trusts: Claude Desktop, Cursor, a local Python script. There is no third party.

Sanctum issues a token, hashes the secret in personal_access_tokens, and validates the bearer header on every request. The middleware is auth:sanctum. The protocol is RFC 6750. The implementation is roughly 40 lines of route configuration. OAuth2 ships the same outcome through a redirect dance that no AI agent benefits from, plus three extra moving parts (authorization server, refresh tokens, scope negotiation) that have to stay healthy 24/7.

The OAuth2 spec recently added a MCP-aware client registration extension in 2025, signal that the protocol does have a future for federated MCP across vendors. For a single-vendor LinkedIn MCP, that future is overkill today.

Bearer token scopes and revocation

A bearer token without a scope is a master key. Production servers issue tokens with explicit ability lists. Sanctum supports this natively through the abilities argument on createToken(). A token tagged mcp:read can list contacts and read sequences; a token tagged mcp:write adds the create-and-launch tools. The MCP server checks $request->user()->tokenCan('mcp:write') before invoking any write tool.

Revocation is the other half of the contract. Every issued token must surface in a user-visible list with a “revoke” button that performs DELETE FROM personal_access_tokens WHERE id = ? synchronously. No cron, no eventual consistency. A leaked token at 14:02 must be dead at 14:02:03, before the agent’s next tool call.

The first-party auth flow for LinkedIn

The LinkedIn session itself sits one layer below the bearer token. The MCP server holds the agent’s token; Fintalio’s session infrastructure holds the LinkedIn auth state for that user’s connected account. A hosted auth wizard, served on a Fintalio-controlled domain, captures the LinkedIn login through the user’s own browser. The user enters their credentials directly on LinkedIn’s UI, never on a Fintalio form. The session token returns to Fintalio’s session infrastructure, scoped to that single user account, and is referenced from the database by an account UUID, never by raw cookie material.

This is the load-bearing design choice for TOS compliance. The 9th Circuit’s hiQ Labs v. LinkedIn ruling, finalized in April 2022, distinguished sharply between scraping public data and acting inside an authenticated session. First-party auth, where the user owns the session and consents to the action, lives on the safe side of that line. Headless-browser scraping with shared cookies does not.

Token rotation and the security audit

Tokens rotate on three triggers, in order of frequency: device change, teammate departure, and credential leak. The rotation flow is identical in all three cases: revoke the old token, issue a new one, update the client config. Sanctum’s database schema makes this trivial; the entire dance is two SQL statements and a config file edit. Teams that skip this discipline end up running quarterly “token amnesty” sweeps where they revoke everything older than 90 days and ask users to re-authenticate. That is expensive, brittle, and avoidable.

Pillar 2, rate limiting: what’s the right unit of throttling for an MCP server?

The unit of throttling for an MCP server is the bearer token. Not the user, not the IP, not the LinkedIn account. A token is the only stable principal in the MCP loop, and it’s the only one the agent itself can be bounded against. Per-token throttling at the HTTP layer, supplemented by per-account daily ceilings at the LinkedIn-action layer, is the two-layer pattern that holds up under real agent traffic.

Why per-token 120 req/min, and not per-user or per-IP

Per-user throttling breaks the moment a user runs two agents in parallel: one in Claude Desktop, one in a Python notebook. Both share the user ID, so a single rogue loop in the notebook locks the desktop. Per-IP throttling breaks the moment a developer ships an agent to production behind a load balancer. The egress IP collapses to a small set, and one tenant starves the others.

A token, by contrast, is the unit the agent actually owns. Throttle per token and a runaway loop in one notebook only locks that notebook, leaving the desktop and the production deployment unaffected. Laravel’s throttle:120,1 middleware does this in one line. Fintalio’s /mcp endpoint is mounted with exactly that middleware, scoped to 120 requests per minute per Sanctum token. The number was chosen against three constraints:

Interactive agent loops typically issue 5 to 20 tool calls per user turn. 120 req/min covers 6 to 24 turns per minute, well above any human-paced session.
Bulk imports (CSV parse + commit) issue one batch call, not 200 individual ones. 120 req/min is never the bottleneck for legitimate batch work.
Runaway loops at 1,000 req/min get caught at the throttle layer before they reach the LinkedIn action layer, where the damage would be account-level.

OWASP’s API4:2023 Unrestricted Resource Consumption guidance is unambiguous: rate limits are mandatory on any public API, and per-principal limits are the only kind that scale.

The two-layer pattern

The HTTP throttle is layer one. Layer two is a per-LinkedIn-account daily ceiling enforced inside the action layer, not at the network edge. Fintalio’s SendThrottle service caps writes per LinkedIn account at 50 connections per day and 50 messages per day, configurable via SEQUENCE_* env vars. These ceilings exist regardless of how many MCP tokens a user has issued.

The split matters. The HTTP throttle bounds the agent’s cadence; the action ceiling bounds the LinkedIn account’s daily footprint. An agent that bursts 119 read-only ListContacts calls in a minute is fine on both layers. An agent that tries to send 51 messages from one LinkedIn account in a 24-hour window is rejected at the action layer with a structured error, no matter how slowly it paces them.

The action-layer ceilings sit below the underlying API path’s account-level caps, which the LinkedIn session provider also enforces. Three layers of throttling, each catching a different failure mode, each cheap to add and expensive to skip.

Bounding agent burst behavior

Agents misbehave in two characteristic ways: they loop, and they fan out. A loop is the same tool called 200 times in 30 seconds because the LLM didn’t notice the previous call returned the answer. A fan-out is the agent calling 50 tools in parallel because it was prompted to “do all of these at once.”

Per-token throttling catches both. Loops trip the 120 req/min ceiling and return a 429 with a Retry-After header. A well-behaved MCP client surfaces the 429 back to the LLM, which then learns to slow down. Fan-outs trip the same ceiling. The throttle does not distinguish between sequential and parallel calls because, at the HTTP layer, it doesn’t have to.

The lesson from running this in production: 429s are a feature. They are the only signal the LLM gets that its loop is wrong, and the only chance to recover before the LinkedIn account itself gets flagged.

Pillar 3, audit logs: what should an MCP server log on every write?

Every write tool an MCP server exposes must log five fields synchronously before the underlying action ships: timestamp, token id, tool name, target identifier, and outcome. Log to durable storage on the same transaction boundary as the write itself. Logging after the fact is logging the wrong thing.

What gets logged on every write

Fintalio’s audit table captures the following on every call to a write tool (CreateContact, UpdateContact, LaunchSequence, PauseSequence, etc.):

created_at - UTC timestamp at the millisecond
personal_access_token_id - the Sanctum token, never the raw secret
tool_name - the MCP tool the agent invoked
payload_hash - SHA-256 of the serialized arguments, never the cleartext
target_resource_id - the contact, sequence, or template the call touched
outcome - success, rejected_throttle, rejected_validation, rejected_tos
linkedin_account_id - which connected account was the action’s source

Notice what isn’t logged: the LinkedIn message body, the user’s full name, the prospect’s email. PII stays out of the audit log. The payload_hash lets you prove a specific payload was processed without storing it; if a TOS investigation requires the cleartext, it lives in the main messages table with the standard retention policy.

How audit logs protect both Fintalio and the user

The legal frame for audit logs is the hiQ Labs v. LinkedIn precedent and LinkedIn’s User Agreement Section 8.2. Both turn on the same question: was the action taken inside an authenticated session, with the user’s consent, at a human-reasonable cadence? An audit log answers that question definitively.

For Fintalio, the log is the platform-level defense. If LinkedIn’s Trust and Safety team flags an account for “automated, repetitive contact attempts,” the audit log proves the cadence was inside the configured 50/day ceiling and the actions came from the user’s own token after explicit LaunchSequence approval. For the user, the log is personal accountability. The user can see exactly which messages their agent sent, in which order, against which targets.

The log is also the only artifact that distinguishes “the user did this” from “an attacker did this with a leaked token.” Compromise scenarios that lack audit logs end in finger-pointing. Compromise scenarios with audit logs end in token revocation and a clear forensics trail.

Retention and query patterns

Audit retention is a design choice with a clear floor and a soft ceiling. The floor is the longest plausible TOS-investigation window from LinkedIn or any other platform partner, which empirically runs 90 to 180 days. The soft ceiling is GDPR’s data-minimization principle, which argues against keeping operational logs indefinitely. A 12-month rolling retention threads that needle: long enough for any investigation, short enough that you’re not warehousing two years of token activity for no operational reason.

Query patterns shape the schema. The three queries that matter in practice are: “show me everything this token did in the last 24 hours” (indexed on personal_access_token_id, created_at), “show me every write against this LinkedIn account today” (indexed on linkedin_account_id, created_at), and “show me every failed write in the last hour” (indexed on outcome, created_at). Compound indexes on those three columns plus a partition by month keep the table responsive at scale.

The 3 anti-patterns we rejected

Three temptations are common in MCP design, and three concrete decisions push back on each. Naming them out loud, with their fix, is half the discipline.

Anti-pattern 1, rapid bursting allowed because “the LLM will be polite”

Some MCP servers ship without throttling on the assumption that the LLM client will self-pace. That assumption breaks on day one. LLMs loop on ambiguous results, fan out on parallel prompts, and have no concept of the rate ceiling at the destination. The fix is non-negotiable: enforce throttling at the server, return 429s on breach, and let the client adapt. The agent is not the principal, the token is.

Anti-pattern 2, opaque auth where tokens can’t be enumerated or revoked

Some early MCP implementations issued long-lived API keys that lived only in environment variables, with no user-visible list and no revoke button. That is the same anti-pattern that led to AWS access-key leaks dominating the 2018-2022 GitHub credential-scanning era. The fix: every token is a row in a table, every row has a revoke UI, and revocation takes effect before the next request.

Anti-pattern 3, audit logs as an afterthought

Logging the request body to a tail -f-able file is not an audit log. An audit log is a durable, indexed, append-only record that survives the server it was generated on. Anything else collapses the first time a TOS investigation lands on the operator’s desk. The fix: write the audit row inside the same database transaction as the action it records, with PII separated and indexes built before the table is large.

Production checklist for your own MCP server

If you are building an MCP server today, the ten items below are the production floor. Each one is doable in an afternoon and saves a week of incident response over the first year of traffic.

Bearer tokens issued per principal, hashed at rest, validated on every request via standard middleware (Sanctum or equivalent).
Token scopes (read, write, execute) enforced at the tool boundary, not the route boundary.
A user-visible token list with a one-click revoke button that performs a synchronous DELETE.
Per-token rate limiting at the HTTP edge, 60 to 120 req/min for typical agent loops.
Per-principal daily action ceilings at the business-logic layer, configurable per tenant.
A 429 Too Many Requests response with a Retry-After header on every throttle breach.
An audit row written in the same transaction as every write tool call, indexed on (principal_id, created_at).
PII excluded from the audit body; cleartext lives in the operational tables with their own retention.
Audit retention floor of 180 days, soft ceiling of 12 months for GDPR alignment.
A documented incident playbook covering leaked-token revocation, throttle-breach analysis, and TOS-investigation response.

That list is the production floor, not the ceiling. Mature deployments add request signing, hardware-bound tokens, anomaly detection on the audit stream, and tenant-level isolation. The ten items above are what gets you to first revenue without a Friday-night incident.

How Fintalio applies all three in one stack

Fintalio’s LinkedIn MCP server ships every pattern above in production today. Authentication uses Laravel Sanctum, with bearer tokens visible in Settings → Tokens API & MCP and one-click revocation in the same panel. Rate limiting uses throttle:120,1 middleware on the /mcp route mount, returning standard 429 responses with Retry-After. The action layer adds per-LinkedIn-account daily ceilings (50 messages, 50 connections by default) configurable per tenant.

The MCP surface itself is 19 tools and 3 resources, deliberately scoped to outreach and sequence management. Every write tool logs synchronously to the audit table before the underlying action leaves the box. The LinkedIn session is captured through a hosted auth wizard so the user enters credentials only on LinkedIn’s own UI; Fintalio’s session infrastructure holds the resulting account reference. Single plan, €69 per month, MCP access bundled.

If you want the same operational discipline applied to your own agent’s LinkedIn workflow, plug Fintalio into Claude Desktop and inspect the audit panel after your first sequence.

FAQ

Why bearer tokens instead of OAuth2 for MCP?

OAuth2’s authorization-code flow assumes a human in a browser approving a third-party app. MCP clients are trusted local binaries (Claude Desktop, Cursor, your own script), so the redirect dance adds three moving parts without any security gain. Bearer tokens per RFC 6750, issued from a user settings panel and revocable in one click, fit the MCP threat model exactly.

What’s the right rate limit for an MCP server?

60 to 120 requests per minute per token covers interactive agent loops (5 to 20 tool calls per turn, 6 to 24 turns per minute) without bottlenecking legitimate use. Fintalio uses 120 req/min via Laravel’s throttle:120,1 middleware. Per-token throttling is the right unit, per OWASP’s API4:2023 guidance, because the token is the only stable principal in the MCP loop.

How long should audit logs be retained?

Floor is 180 days, soft ceiling is 12 months. The floor covers the longest plausible TOS-investigation window from a platform partner; the ceiling honors GDPR’s data-minimization principle. Index on (token_id, created_at) and (account_id, created_at) so the two most common queries (this token in 24h, this account today) stay fast as the table grows.

Does the LinkedIn user enter credentials on Fintalio’s UI?

No. The user enters credentials directly on LinkedIn through a hosted auth wizard served by Fintalio’s session infrastructure. The session token returns to Fintalio scoped to that user’s connected account. This first-party auth pattern sits on the safe side of the hiQ Labs v. LinkedIn ruling, which distinguishes authenticated user sessions from third-party scraping.

What happens when a token gets leaked?

Open Settings → Tokens API & MCP, click revoke on the affected token, and reissue a fresh one with a new label. Revocation is synchronous: the next request from the leaked token returns 401 before any tool fires. Query the audit log on personal_access_token_id to see exactly what the leaked token did between issuance and revocation, then update the affected Claude Desktop or Cursor config.

Auth, Rate Limits, Audit Logs: Production MCP

TL;DR

Why “production-grade MCP” isn’t just MCP that compiles

Pillar 1, authentication: how should an MCP server prove who is calling?

Why Laravel Sanctum over OAuth2 for MCP

Bearer token scopes and revocation

The first-party auth flow for LinkedIn

Token rotation and the security audit

Pillar 2, rate limiting: what’s the right unit of throttling for an MCP server?

Why per-token 120 req/min, and not per-user or per-IP

The two-layer pattern

Bounding agent burst behavior

Pillar 3, audit logs: what should an MCP server log on every write?

What gets logged on every write

How audit logs protect both Fintalio and the user

Retention and query patterns

The 3 anti-patterns we rejected

Anti-pattern 1, rapid bursting allowed because “the LLM will be polite”

Anti-pattern 2, opaque auth where tokens can’t be enumerated or revoked

Anti-pattern 3, audit logs as an afterthought

Production checklist for your own MCP server

How Fintalio applies all three in one stack

FAQ

Why bearer tokens instead of OAuth2 for MCP?

What’s the right rate limit for an MCP server?

How long should audit logs be retained?

Does the LinkedIn user enter credentials on Fintalio’s UI?

What happens when a token gets leaked?

Read next

Plug LinkedIn into your AI agent

#TL;DR

#Why “production-grade MCP” isn’t just MCP that compiles

#Pillar 1, authentication: how should an MCP server prove who is calling?

#Why Laravel Sanctum over OAuth2 for MCP

#Bearer token scopes and revocation

#The first-party auth flow for LinkedIn

#Token rotation and the security audit

#Pillar 2, rate limiting: what’s the right unit of throttling for an MCP server?

#Why per-token 120 req/min, and not per-user or per-IP

#The two-layer pattern

#Bounding agent burst behavior

#Pillar 3, audit logs: what should an MCP server log on every write?

#What gets logged on every write

#How audit logs protect both Fintalio and the user

#Retention and query patterns

#The 3 anti-patterns we rejected

#Anti-pattern 1, rapid bursting allowed because “the LLM will be polite”

#Anti-pattern 2, opaque auth where tokens can’t be enumerated or revoked

#Anti-pattern 3, audit logs as an afterthought

#Production checklist for your own MCP server

#How Fintalio applies all three in one stack

#FAQ

#Why bearer tokens instead of OAuth2 for MCP?

#What’s the right rate limit for an MCP server?

#How long should audit logs be retained?

#Does the LinkedIn user enter credentials on Fintalio’s UI?

#What happens when a token gets leaked?

#Read next

Plug LinkedIn into your AI agent

TL;DR

Why “production-grade MCP” isn’t just MCP that compiles

Pillar 1, authentication: how should an MCP server prove who is calling?

Why Laravel Sanctum over OAuth2 for MCP

Bearer token scopes and revocation

The first-party auth flow for LinkedIn

Token rotation and the security audit

Pillar 2, rate limiting: what’s the right unit of throttling for an MCP server?

Why per-token 120 req/min, and not per-user or per-IP

The two-layer pattern

Bounding agent burst behavior

Pillar 3, audit logs: what should an MCP server log on every write?

What gets logged on every write

How audit logs protect both Fintalio and the user

Retention and query patterns

The 3 anti-patterns we rejected

Anti-pattern 1, rapid bursting allowed because “the LLM will be polite”

Anti-pattern 2, opaque auth where tokens can’t be enumerated or revoked

Anti-pattern 3, audit logs as an afterthought

Production checklist for your own MCP server

How Fintalio applies all three in one stack

FAQ

Why bearer tokens instead of OAuth2 for MCP?

What’s the right rate limit for an MCP server?

How long should audit logs be retained?

Does the LinkedIn user enter credentials on Fintalio’s UI?

What happens when a token gets leaked?

Read next