What is an Agent-Readiness Score?

An Agent-Readiness Score (0-100) measures how well AI agents can discover, understand, and recommend your business. It evaluates structured data, LLM readability, technical accessibility, and agentic SEO signals.

How does AgentLayers help with EU AI Act compliance?

AgentLayers automatically evaluates AI agents against EU AI Act requirements including risk classification, transparency obligations, and documentation standards. It provides compliance scoring and readiness checklists aligned with Regulation (EU) 2024/1689.

Is the Agent-Readiness Scanner free?

Yes, all features are free during the beta phase. You can run unlimited scans without signing up. Create a free account to save your scan history and track improvements over time.

What is the AgentLayers Trust Score for AI agents?

The Trust Score (0-100) evaluates AI agents across multiple dimensions: security, interoperability, documentation, and reliability. High-scoring agents earn a verified AgentLayers Certified badge and are listed in our curated agent directory.

Agent-Readiness Methodology — 4 Weighted Criteria

Abstract

As AI agents become critical infrastructure in business and consumer applications, the need for a standardized, reproducible, and transparent evaluation framework becomes paramount. This paper presents the dual-scoring methodology developed by AgentLayers: the Agent-Readiness Score, which measures a website's capacity to be discovered, understood, and recommended by AI agents (score 0–100); and the Trust Score, which evaluates the reliability, transparency, security, and compliance of AI agents through fully automated, timestamped, and publicly auditable tests. We detail each criterion, its weighting, the testing protocol, and the technical implementation. All tests are designed to be reproducible and independent of any commercial relationship with the evaluated entities.

1Introduction

The rapid proliferation of AI agents — from autonomous assistants to specialized task executors — has created a trust deficit in the ecosystem. Businesses deploying agents need assurance of reliability and compliance; end-users need guarantees of safety and transparency; and businesses wanting to be discovered by agents need to adapt their digital presence.

AgentLayers addresses this gap with two complementary evaluation instruments. The Agent-Readiness Score targets businesses seeking to optimize their web presence for the agentic economy. The Trust Score targets AI agents themselves, providing an objective, automated assessment of their operational quality. Additionally, the Skill Trust Score evaluates third-party skills and plugins before they are installed on AI agents, detecting prompt injection, obfuscation, excessive permissions, and supply chain risks. The MCP Server Trust Score evaluates Model Context Protocol server configurations for endpoint security, permission scope, data exfiltration, and auth weaknesses. The A2A Protocol Trust Score assesses Agent-to-Agent protocol implementations for authentication, message signing, delegation control, and identity verification.

Design principle: Every test described in this methodology is fully automated, reproducible by any third party, and timestamped for temporal tracking. No subjective human evaluation is involved in the scoring process.

2Agent-Readiness Score (Businesses)

The Agent-Readiness Score measures a website's capacity to be discovered, understood, and recommended by AI agents. The score ranges from 0 to 100 and is computed automatically by crawling the site and analyzing its structure, content, and metadata.

Criterion	Weight	Description
Structured Data	20%	JSON-LD, Open Graph, Microdata, RDFa detection and richness
LLM Readability	15%	Ability for LLMs to comprehend positioning, offer and value proposition
Technical Accessibility	10%	Crawlability, response time, SSL, robots.txt, sitemap
Agentic SEO	15%	Likelihood of agent recommendation over competitors
Protocol Discovery	15%	Well-known endpoints for MCP, OAuth, A2A, API Catalog, Agent Skills (Cloudflare-aligned)
Authority	25%	External credibility — Wikipedia presence, domain age, Wayback footprint, Open PageRank, Tranco rank

2.1Structured Data (30%)

Structured data constitutes the primary signal analyzed by AI agents. It transforms human-readable content into machine-readable information. Our crawler (built on Cheerio) parses the HTML document and extracts all <script type="application/ld+json"> blocks, itemscope/itemtype attributes (Microdata), typeof/property attributes (RDFa), and og:* meta tags.

Each JSON-LD block is parsed and the @type field is extracted. Types are compared against a curated list of high-value schema types (Organization, Product, FAQ, LocalBusiness, etc.). The theoretical maximum score per sub-criterion is 100, capped at the section weight.

2.2LLM Readability (25%)

This criterion assesses whether an LLM can comprehend the positioning, offer, and value proposition of a business by reading the page content. We evaluate clarity of value proposition, content structure and hierarchy, presence of pricing information, and use of natural language descriptions optimized for AI comprehension. The Live Test feature — shipped in production on PRO plans — probes real LLMs with brand-aware queries that include the domain name (e.g. "What do you know about Acme (acme.com)?") to disambiguate same-name brands and improve recall accuracy.

2.3Technical Accessibility (20%)

This criterion evaluates whether an AI agent can technically access the site's content without friction. Sub-criteria include: response time (<2s), SSL/TLS configuration, robots.txt accessibility, sitemap presence, proper HTTP status codes, and absence of aggressive anti-bot measures that would block legitimate AI crawlers.

2.4Agentic SEO (25%)

The most strategic criterion. It measures whether an AI agent would choose to recommend this business over a competitor. Factors include domain authority signals, citation frequency in AI training data, and semantic relevance of content to probable agent queries.

Agentic SEO complements the LLM Live Test (section 2.6): structural signals determine whether a business deserves to be recommended; the live test verifies whether it actually is.

2.5Protocol Discovery

Beyond content and metadata, agents need machine-readable entry points: which APIs to call, where to authenticate, what skills are exposed. Protocol Discovery probes six well-known endpoints aligned with the IETF / OpenAPI / MCP / A2A specifications and Cloudflare's agent-readiness category. We validate the RFC-mandatory fields and ping the endpoints advertised in the metadata — a 200 response with valid-shape JSON is not enough. WebMCP is shown as a 7th informational row but is not counted in the score (it requires a headless browser to verify).

MCP Server Card describing tools and capabilities (MCP SEP-1649).
OAuth 2.0 Authorization Server metadata (RFC 8414). OIDC discovery accepted as fallback.
OAuth 2.0 Protected Resource metadata (RFC 9728).
API Catalog linkset pointing at OpenAPI / AsyncAPI service descriptions (RFC 9727).
Google Agent2Agent (A2A) agent card.
Agent Skills index (Cloudflare proposal).
WebMCP runtime registration of browser-callable tools (best-effort; full check requires a headless browser).

This dimension was directly informed by Cloudflare's launch of isitagentready.com (April 2026), which validated Protocol Discovery as a first-class category of agent-readiness. AgentLayers extends the same checklist with deeper, code-level analysis — Trust Score (GitHub repository analysis), Skill / MCP / A2A security scanners, and EU AI Act compliance. The two tools are complementary: Cloudflare's free scan tells you whether your endpoints exist; AgentLayers tells you whether the agents and skills behind them are trustworthy. Source:

2.6Authority (25%)

The first five dimensions measure how well a site is structured for agents — they tell you whether your content can be extracted, parsed, and acted on. They do not tell you whether an LLM will actually cite you. That signal lives outside the page: in Wikipedia, in the link graph, in the Wayback footprint, in the years your domain has been online.

Authority is the dimension that closes the validity gap. It composites five free, independently verifiable sub-signals — none of which can be gamed in a sprint:

Open PageRank (30%) — Domain Authority-style 0–10 score derived from the public web graph (free tier, 1000 lookups / month).
Wikipedia presence (20%) — does a Wikipedia article exist for this domain? In how many languages? A site with a 30-language Wikipedia article is in every major LLM's training set.
Tranco rank (20%) — research-grade top-1M domain ranking with log-scale normalisation (rank 1 → 100, rank 1M → 0).
Wayback Machine first seen (15%) — years of continuity archived by archive.org. Sites crawled since the 90s are memorised by every training run.
Domain age (15%) — registration date via RDAP. A floor that prevents brand-new domains from impersonating established players.

Authority is mostly out of your short-term control. Don't treat a 40/100 Authority as a failing grade — it tells you LLMs haven't memorised you yet, which is the rational starting point for any sub-decade-old site. Focus on dimensions 2.1–2.5 (structure) for the next 90 days and on earning citations from high-Tranco sites for the next 18 months.

2.8 Empirical Validation →

2.7LLM Tests: Knowledge Recall + Live Discovery (PRO)

The Agent-Readiness Score is not a single signal. AgentLayers runs two complementary live LLM tests on every PRO scan, because they answer different questions. Reading them together is what produces an honest verdict — neither test alone is sufficient.

2.7.1 Knowledge Recall (offline, training-data only)

The Recall test queries a chat model with no web access (default: gpt-4o-mini). It asks brand-aware questions like "What do you know about Acme (acme.com)?" along with category and location prompts derived from your site's structured data. It measures whether the model already knows your brand from its training data.

Detection is mention-based and negation-aware: brand echoes wrapped in phrases like "I don't know" or "I'm not familiar with" are not counted as a mention. Each prompt is classified Direct, Likely, or Absent. This signal is most meaningful for established brands with significant pre-existing web presence at the model's training cutoff.

A low Recall score for a recent or low-traffic site is expected and not a failure — it simply means the model hasn't memorized you yet. Read the Discovery test below for the live signal.

2.7.2 Live Discovery (web-grounded)

The Discovery test queries a search-enabled model (default: gpt-4o-mini-search-preview) with category-level prompts that deliberately do not name your brand. It measures whether AI agents organically surface your site when a real user asks a realistic question today, using live web search results.

A prompt counts as a hit if either (a) your brand name appears in the answer or (b) at least one citation URL points to your own domain. Each result includes the cited sources so you can audit which pages the model actually fetched. This is the signal that matches what users see when they prompt ChatGPT, Perplexity, or Claude with web search enabled.

PRO feature: Both tests require real API calls — the Recall test costs ~standard chat-completion rates, and the Discovery test adds a web-search surcharge per prompt. Available on PRO plans and above. FREE scans show a locked preview. Neither test is factored into the Agent-Readiness Score — they are qualitative signals displayed alongside the score, and they are designed to be read together.

3Foundational Principles

Reproducibility

Every test can be re-run by any party and will produce the same score (within stochastic LLM variance). The methodology is documented publicly.

Temporal Evolution

Scores are not static. Each test is timestamped and stored. Users see score evolution curves over time. Degrading agents are flagged; improving agents gain visibility.

Methodology Transparency

The complete methodology is published in open access. The competitive advantage is not the methodology — it's execution, accumulated data, and network effects.

Independence

AgentLayers does not sell optimization services to the agents it rates. The business model (subscriptions, premium listings, API) creates no conflict of interest with scoring.

References

Schema.org — Structured Data Vocabulary, https://schema.org (accessed March 2026).
Google — Structured Data Testing Tool Documentation, https://developers.google.com/search/docs/appearance/structured-data (2025).
European Commission — EU Artificial Intelligence Act, Regulation (EU) 2024/1689 (2024).
GDPR — General Data Protection Regulation, Regulation (EU) 2016/679 (2016).
OWASP — LLM Top 10 Security Risks, v1.1 (2024).

This document constitutes the public technical reference for AgentLayers's scoring algorithms. It is updated as the product evolves and serves as the basis for all evaluation processes.
© 2026 AgentLayers Research — Open methodology, v1.0 — March 2026

Agent-Readiness Score (Businesses)

Table of Contents