Automated Content Quality Checks: The 24-Factor Framework
Most SEO tools give you a single quality score with no explanation of how they got there. That number is the output of someone else’s algorithm, applied to your content, with all the methodology hidden in a black box.
Here’s why this matters technically: a black-box score doesn’t teach you anything. If your content scores 61, you don’t know whether to rewrite the introduction, adjust keyword density, shorten your sentences, or add more internal links. The score is useless without the breakdown.
Let me walk you through the 24-factor automated content quality checks system we built into Agentic Marketing’s content pipeline. I’ll cover the module architecture, what each factor measures, why it’s weighted the way it is, and how the composite score is produced. By the end, you’ll understand exactly how this class of system works — whether you’re evaluating our platform or building your own.
Why 24 Factors?
The decision to use 24 factors wasn’t arbitrary. It came from analyzing what actually distinguishes high-ranking content from low-ranking content across a large corpus of pages.
Most AI SEO tools consolidate everything into three or four categories: keyword, readability, structure, links. That’s accurate at the highest level, but it’s not actionable. “Improve your keyword score” doesn’t tell you whether you need to increase density, add semantic variations, improve placement in headings, or address stuffing issues. These require separate measurements.
Our 24 factors map to 6 domains, each with 4 distinct sub-factors:
- Keyword Quality (4 factors) — density, distribution, variation, stuffing detection
- Search Intent (4 factors) — query classification, content type match, structure alignment, CTA placement
- Content Depth (4 factors) — word count vs. SERP median, topic coverage breadth, section depth, information density
- Readability (4 factors) — sentence length, reading grade level, paragraph structure, active/passive ratio
- Technical Structure (4 factors) — heading hierarchy, meta title, meta description, URL quality
- Authority Signals (4 factors) — internal link count, internal link anchor quality, external authority links, E-E-A-T indicators
Each domain contributes a weighted sub-score to the composite 0–100.
Module Architecture
The quality scoring system is built in Python as a set of composable modules. Here’s the implementation structure:
data_sources/modules/
├── keyword_analyzer.py # Keyword domain (4 factors)
├── search_intent_analyzer.py # Intent domain (4 factors)
├── content_length_comparator.py # Depth domain (4 factors)
├── readability_scorer.py # Readability domain (4 factors)
├── seo_quality_rater.py # Composite scorer + tech structure
└── __init__.py
The seo_quality_rater.py module acts as the orchestrator. It imports all five sub-modules, runs them against the article content, collects sub-scores, applies the domain weighting, and returns a composite score with the full per-factor breakdown.
The interface looks like this in practice:
from data_sources.modules.seo_quality_rater import SEOQualityRater
rater = SEOQualityRater(
content=article_text,
keyword="automated content quality checks",
secondary_keywords=["content quality automation", "seo quality scoring"],
serp_median_words=2400
)
result = rater.score()
# Returns: {'composite': 82, 'keyword': 21, 'intent': 18, 'depth': 17, ...}
Each sub-module is independently testable and can be run in isolation for debugging. Let me walk through each domain.
Domain 1: Keyword Quality (25% Weight)
This domain covers four distinct keyword factors:
Factor 1.1: Keyword Density (8%)
Measures the ratio of primary keyword occurrences to total word count. Target range: 1.0–1.5%. Calculated using all natural variations, not just exact-match.
The implementation uses a normalized term frequency calculation that accounts for stop words and treats morphological variants as equivalent (e.g., “optimize content” and “content optimization” both count toward the density of a “content optimization” target keyword).
Under-density (< 0.7%) and over-density (> 2.0%) both receive zero on this factor. The scoring curve is parabolic — peak score at 1.2%, declining toward the bounds.
Factor 1.2: Keyword Distribution (7%)
Checks whether the keyword is distributed across the document rather than concentrated in specific sections. The algorithm splits the document into thirds and calculates the coefficient of variation across sections. Heavily front-loaded or back-loaded articles score poorly even if overall density is correct.
This factor specifically catches a common AI writing failure mode: front-loading keywords in the introduction to “SEO optimize” the opening and then drifting away from the topic in the body.
Factor 1.3: Keyword Variation (6%)
Measures semantic breadth. Are you using the exact same keyword phrase repeatedly, or are you covering the topic through natural variation? The analyzer compares your article’s term variety against a reference distribution of high-ranking content for the same topic cluster.
High variation score means: “automated content quality checks,” “content quality automation,” “quality scoring pipeline,” “automated quality review” are all present in addition to the primary keyword.
Factor 1.4: Stuffing Detection (4%)
Identifies unnatural keyword clustering — multiple exact-match appearances within a 50-word window. This is a penalty factor: a score of zero on 1.4 means the composite receives a penalty even if 1.1 through 1.3 are perfect.
Domain 2: Search Intent (20% Weight)
Factor 2.1: Query Classification (6%)
The search_intent_analyzer.py module classifies the target keyword using a fine-tuned intent classifier. Categories: informational, navigational, commercial investigation, transactional.
Classification accuracy on our benchmark corpus: 89% agreement with human raters. Mis-classifications typically occur at category boundaries (informational vs. commercial investigation keywords with mixed SERP signals).
Factor 2.2: Content Type Match (6%)
Compares your content’s structural signature against the expected pattern for the classified intent type. An informational keyword expects a guide structure (numbered steps or hierarchical sections, definitions, minimal CTAs). A commercial investigation keyword expects comparison tables, pros/cons sections, and explicit recommendations.
The module extracts structural features — heading patterns, CTA density, definition-paragraph ratios — and computes cosine similarity against stored templates for each intent type.
Factor 2.3: Structure Alignment (5%)
Checks whether your section ordering matches the typical SERP-validated ordering for the content type. For how-to guides, this means: introduction → context → steps → common mistakes → conclusion. For comparison articles: introduction → context → comparison table → detailed analysis → conclusion.
Significant divergence from the expected ordering is penalized even if individual sections are well-written.
Factor 2.4: CTA Placement (3%)
Checks that your call-to-action appears after value delivery, not before. CTAs in the first 60% of the document receive a penalty on this factor.
Domain 3: Content Depth (20% Weight)
Factor 3.1: Word Count vs. SERP Median (7%)
The content_length_comparator.py module fetches live SERP data for the target keyword, extracts and parses the top-10 results, and calculates the word count median. Your article score on this factor is a function of how close you are to that median.
Target zone: 90–120% of SERP median. Below 75% receives a steep penalty; above 140% receives a minor penalty (overly long content correlates with lower time-on-page).
The live SERP fetch is what makes this factor genuinely useful. A fixed word count target (e.g., “all articles should be 2,000 words”) ignores that SERP medians vary from 800 words to 4,500 words depending on the keyword. The benchmark is always relative.
Factor 3.2: Topic Coverage Breadth (5%)
Uses term frequency–inverse document frequency (TF-IDF) analysis against the top-ranking SERP results to identify the semantic field expected for the topic. Your article is scored on how many of these expected topic terms it contains.
This is a coverage check, not a keyword stuffing check. It’s asking: does your article discuss the full topic space, or does it stay narrowly on the primary keyword while missing adjacent concepts that signal expertise?
Factor 3.3: Section Depth (5%)
Measures average information density per section — tokens per heading divided by heading count. Articles with many shallow sections (one or two sentences under each H2) score poorly. The target is substantive coverage under each heading.
Factor 3.4: Information Density (3%)
The ratio of concrete information (numbers, named entities, specific terminology) to total text. High information density is characteristic of expert content; low density is characteristic of AI padding. The analyzer uses named entity recognition and number extraction to quantify this.
Domains 4–6: Readability, Structure, Authority Signals
These three domains collectively account for 35% of the composite score. Here’s the condensed breakdown:
Readability Domain (15% Weight)
| Factor | Weight | Measurement |
|---|---|---|
| Sentence length | 5% | Average words/sentence, target 15–22 |
| Reading grade level | 5% | Flesch-Kincaid Grade Level, target 8–10 |
| Paragraph structure | 3% | Average sentences/paragraph, target 3–5 |
| Active/passive ratio | 2% | Active voice target >80% |
The readability_scorer.py module runs all four measurements in a single pass. Grade level and sentence length are calculated using standard algorithms. Active/passive classification uses a lightweight dependency parsing approach — checking for auxiliary verb + past participle patterns.
Technical Structure Domain (10% Weight)
| Factor | Weight | Measurement |
|---|---|---|
| Heading hierarchy | 4% | H1/H2/H3 validity, keyword in headings |
| Meta title quality | 3% | Length (50–60 chars), keyword placement |
| Meta description | 2% | Length (150–160 chars), keyword + value prop |
| URL quality | 1% | Keyword inclusion, length, lowercase hyphens |
These are the most mechanical factors. They’re either correct or they’re not. Teams that understand this often fix all four in under 10 minutes — they’re fast wins.
Authority Signals Domain (10% Weight)
| Factor | Weight | Measurement |
|---|---|---|
| Internal link count | 4% | Minimum 3, optimal 4–6 |
| Internal anchor quality | 3% | Descriptive vs. generic anchor text |
| External authority links | 2% | Authority score of linked domains |
| E-E-A-T indicators | 1% | Author bio signals, claim sourcing |
The internal anchor quality factor specifically flags generic anchor text (“click here,” “read more,” “learn more”) and gives zero credit for those links. Descriptive anchors like “our guide to keyword density best practices” count fully.
How the Composite Score Is Calculated
The composite is a weighted sum:
composite = (
keyword_score * 0.25 +
intent_score * 0.20 +
depth_score * 0.20 +
readability_score * 0.15 +
structure_score * 0.10 +
authority_score * 0.10
)
Each domain score is 0–100 before weighting. The composite is produced by applying weights and summing. The result is the number you see in the platform dashboard.
What matters for your workflow: the top three domains account for 65% of your score. If you’re scoring below 75, you’re almost certainly failing at least one of keyword, intent, or depth. The structure and authority domains are recoverable in 15–20 minutes. The top three require more substantive fixes.
Interpreting Your Score Distribution
Here’s what score ranges typically indicate about which domains need work:
| Composite Range | Most Likely Culprits |
|---|---|
| 85–100 | Minor polish on authority signals |
| 75–84 | One failing domain, usually depth or intent |
| 65–74 | Two failing domains, often keyword + depth |
| 55–64 | Fundamental structural mismatch with SERP pattern |
| Below 55 | Content type mismatch or completely off-keyword |
A score below 55 almost always means the content doesn’t match what’s ranking for the target keyword. This is usually a brief problem: the article was written for the wrong intent type (you wrote a listicle for a query that rewards how-to guides, or a how-to for a query where the SERP is dominated by comparison content).
Why Automated Quality Checks Matter at Scale
Here’s the practical application: manual review of 30 articles per month is feasible. Manual review of 150 articles per month is not. At content production scale, automated content quality checks are what separate teams that improve their content over time from teams that ship inconsistent quality.
The 24-factor framework gives you a quality floor. Every article that hits 80+ has met a minimum threshold across all six domains. That doesn’t guarantee a top-3 ranking — topical authority, backlinks, and domain-level signals still matter significantly. But it means you’re not publishing content that fails on basic quality dimensions.
The ai content optimization tips article covers the practical optimization workflow in more detail — how to use the per-domain scores to prioritize your editing passes.
Conclusion
The 24-factor automated content quality checks framework reflects a core design philosophy: SEO quality is not a single thing. It’s the product of six distinct domains, each requiring separate measurement and different interventions when they fail.
A black-box single score doesn’t give you this. Understanding which specific factors drive your composite — and how each factor is calculated — is what lets you improve systematically rather than guessing.
Under the hood, all of this is standard NLP + statistical analysis. The value isn’t in any single algorithm; it’s in having all the algorithms applied in a coordinated pipeline, with each factor weighted by its empirical contribution to ranking performance.
That’s the architecture. Use it to audit your content, understand your gaps, and optimize with precision.
See your full 24-factor score breakdown on your next article. Try Agentic Marketing free — 5 articles, no credit card required.
Get your content quality score →