Methodology
How Club Sentiment Measures Fan Mood
Club Sentiment is an independent football analytics project that measures the emotional state of a club’s public supporter ecosystem. Each day, the platform gathers supporter discussion, football media coverage, and public reaction across several channels, filters that evidence for club relevance, and converts it into a structured sentiment score from 1 to 100.
The system is designed to answer a specific question: how do supporters feel about the club right now? It does not attempt to rate squad quality, predict results, or judge tactical strength. It is a measurement of supporter mood as expressed in public football conversation.
Institutional principle
1. What the score represents
Every club receives a daily score on a 1-100 scale. Higher values indicate a more positive supporter environment, while lower values indicate more frustration, pressure, pessimism, or emotional instability around the club.
The score is intended to measure the aggregate mood of publicly visible supporter discussion. That mood is shaped by results, injuries, transfers, manager pressure, media narratives, and broader club context. Because football fandom is emotional and often reactive, the system is built to preserve genuine swings in mood while reducing distortion from off-topic content, ambiguity, spam, and short-lived noise.
| Range | Label | Interpretation |
|---|---|---|
| 96-100 | Ecstatic | Peak euphoria, title-level energy, historic result energy, or overwhelming optimism. |
| 91-95 | Electric | Supporter mood is intensely positive and emotionally charged. |
| 86-90 | Euphoric | Major positivity, broad unity, strong confidence. |
| 81-85 | Surging | Strong upward momentum in fan mood. |
| 76-80 | Buzzing | Clearly positive atmosphere with visible excitement. |
| 71-75 | Very Positive | Fans are confident, encouraged, and generally pleased. |
| 66-70 | Confident | Supporters are in a healthy positive state. |
| 61-65 | Encouraged | More positive than negative, but not emotionally explosive. |
| 56-60 | Optimistic | Positive lean with remaining uncertainty. |
| 51-55 | Leaning Positive | Slightly favorable balance of public mood. |
| 46-50 | Mixed | Divided reaction, balanced between positivity and frustration. |
| 41-45 | Uneasy | Confidence is weakening; unease is visible. |
| 36-40 | Frustrated | Negative feeling is established and sustained. |
| 31-35 | Disappointed | Fan confidence is deteriorating in a visible way. |
| 26-30 | Angry | Public supporter mood is sharply negative. |
| 21-25 | Toxic | Very negative discourse, often featuring blame and pressure. |
| 16-20 | Miserable | Deeply pessimistic atmosphere around the club. |
| 11-15 | Meltdown | Severe instability in supporter sentiment. |
| 1-10 | In Crisis | Extreme negativity or public emotional collapse. |
2. Full daily flow
For each club, Club Sentiment runs a daily evidence pipeline. The process is designed to be source-aware, relevance-aware, and transparent enough to support historical comparison rather than only one-off daily reactions.
Step 1 — Team context
The system starts with a structured team record containing club name, slug, search name, league, and known source-specific aliases. These values shape how evidence is searched and how ambiguity is handled.
For clubs with ambiguous names, source-specific search overrides are used so the system searches for the football club rather than a city, person, product, or unrelated topic.
Step 2 — Parallel evidence collection
The platform then gathers evidence in parallel from four public source families: Reddit, X, YouTube, and football news.
Each source has its own search logic, relevance logic, and fallback logic. The aim is not to force identical behavior across all platforms, but to extract the most useful football signal from each environment.
Step 3 — Relevance filtering
Raw evidence is filtered using club-specific rules and, for selected ambiguous clubs, an optional Gemini-based relevance gate.
If filtering becomes too aggressive and evidence volume drops below the minimum threshold, the system relaxes filtering in stages rather than letting the club go dark.
Step 4 — Daily sentiment analysis
The filtered evidence is sent to Gemini in a structured prompt that asks the model to score several football-specific sentiment dimensions, explain the dominant themes, and produce source-level sub-scores.
The final daily reactive score is then calculated from the dimension scores rather than trusting a single loose overall number.
Step 5 — Baseline trend computation
After the reactive score is produced, the system looks back over prior days and computes a baseline sentiment trend using an exponentially weighted average of earlier reactive scores.
This baseline is intentionally slower-moving than the daily score, so it can represent the club’s broader emotional trend.
Step 6 — Display score
The score shown publicly on the site is a blend of the new daily reactive score and the prior baseline trend. This creates a display score that remains responsive without becoming unstable.
Evidence volume influences how much weight is given to the reactive layer versus the baseline layer.
3. How Reddit evidence is collected
Reddit is treated as a supporter-community source rather than a generic social feed. The system uses several progressively broader mechanisms, starting with direct club communities and expanding only when necessary.
3.1 Known subreddit mapping
Club Sentiment maintains a known set of team-to-subreddit mappings for clubs with established supporter communities. When a known mapping exists, that subreddit is searched first.
Additional candidate subreddit names are also generated from the team slug, normalized club name, and common naming variations.
3.2 Team subreddit RSS
The Reddit collector first reads the club subreddit’s recent RSS feed. This provides recent post titles and summaries without requiring a Reddit API key.
Each item is stored with metadata such as source kind, publication time, subreddit name, post id, and whether the post appears to be a match thread.
3.3 Match-thread and post-comment enrichment
After fetching the freshest posts from club subreddits, the system prioritizes likely match threads and recent team posts with commentable post ids. It then fetches top comments from a limited number of those posts.
Match threads are allowed a higher comment cap than ordinary posts because they are often the highest-value expressions of real supporter mood immediately before, during, and after matches.
3.4 Subreddit discovery
If the direct team subreddit is weak, missing, or uncertain, the system searches Reddit’s subreddit directory for communities that resemble the club’s name and football context.
Candidate subreddits are scored using their display name, title, query-term overlap, and basic community size signals.
3.5 League and general-football fallback
If club-specific subreddit evidence is still thin, the system searches broader football communities such as league-level or general-football subreddits for recent posts mentioning the club.
This allows the platform to capture relevant discussion even when a club’s own subreddit is inactive or very small.
3.6 Ordering and de-duplication
Reddit evidence is bucketed into team, discovered, and fallback categories. Within those buckets, evidence is sorted by source type and freshness, then de-duplicated at the text level.
This preserves the highest-value direct supporter evidence while still allowing broader football discussion to supplement sparse clubs.
4. How X evidence is collected
X is treated as a rapid-reaction source. It is especially useful for major clubs, breaking news, and emotionally immediate public discussion after significant events.
4.1 Query construction
For selected clubs, Club Sentiment uses club-specific X query overrides that combine exact club names, major hashtags, and official or highly associated handles.
For clubs without an explicit override, the system builds a simpler search around the club name and a normalized hashtag form. Ambiguous clubs receive extra football-context terms to reduce drift.
4.2 Recent-volume check
Before spending search calls on full result collection, the system checks the recent tweet count for the query. If the count falls below a configured threshold, X evidence is skipped for that run.
This avoids wasting usage on clubs or days where X is unlikely to add meaningful signal.
4.3 Recent search and pagination
When a topic shows enough recent activity, the system uses X’s recent-search endpoint to fetch the newest relevant posts. For high-volume clubs or major stories, a second page may be requested.
This keeps the collector efficient: it expands when the topic is busy, but remains conservative when the topic is quiet.
4.4 Ranking
X posts are not used in raw API order alone. The system ranks them using a hybrid of freshness and engagement. Extremely fresh posts are favored, while public engagement helps separate meaningful supporter reaction from low-signal chatter.
The output is then de-duplicated so repeated phrasing or near duplicates do not overwhelm the day’s sample.
4.5 Rate and usage awareness
The X collector inspects usage and rate-limit information and slows down when necessary. This protects system stability and helps preserve the source as a sustainable daily signal rather than a source that works only on some days.
5. How YouTube evidence is collected
YouTube is treated as a supporter reaction layer. It is especially useful for post-match fan commentary, match reaction content, and discussion around emotionally loaded club events.
5.1 Query variants
The YouTube collector supports multiple football-specific query templates (for example match reaction, fan reaction, and post-match analysis) around the club’s search name.
In production, query expansion is currently constrained for quota control, so only a limited subset of those templates may run on a given scrape.
5.2 Video-level relevance screening
Before comments are collected, candidate videos are screened using the video title and description. The system checks whether the club is actually being discussed and whether the surrounding context is recognizably football-related.
Ambiguous club names are held to a stricter standard and must show both club identity and football context.
5.3 Comment collection
Once relevant videos are identified, the system collects top level comments from a limited number of the most recent eligible videos. Comments are trimmed to a bounded length and de-duplicated.
YouTube is therefore not being used as a raw video crawler. It is being used as a football fan-reaction sampler.
5.4 Cost control
YouTube search calls are quota-expensive relative to comment fetch calls, so the collector is intentionally selective. Only a subset of clubs are enabled by default for YouTube, with a focus on clubs and situations where YouTube adds the strongest signal.
6. How football news evidence is collected
News is treated as a narrative and framing source. It does not represent pure supporter voice, but it helps capture how club events are being described in public football coverage.
6.1 Query design
Club Sentiment uses a recent Google News RSS search and limits attention to the last seven days. For ambiguous clubs, football context is injected into the query itself.
6.2 Headline and summary extraction
The collector reads both headlines and article summaries, strips markup, removes obvious junk, and stores the resulting snippets as candidate evidence.
6.3 Relevance and junk suppression
News evidence is filtered for football context and common low-value markers such as betting-style language, live-update pages, or clearly non-club topics.
This makes the news layer more useful as a media-tone signal rather than a broad web-search feed.
6.4 Recency ranking
News snippets are ranked with a strong recency bias. The aim is to measure the club’s current narrative environment, not to mix last week’s framing with stories that are already obsolete.
7. Club relevance filtering
Raw evidence alone is not enough. Many club names are ambiguous, and online discussion is full of partial matches, unrelated references, and duplicate phrases. Club Sentiment therefore uses a layered relevance system.
7.1 Rule-based relevance
For selected ambiguous clubs, the system defines strong terms, football-context terms, and excluded terms. A snippet can be retained because it strongly identifies the club, or because it mentions the club alongside football context.
Snippets that match known non-football ambiguity patterns are excluded.
7.2 Optional Gemini relevance gate
When enabled, a second layer sends batches of snippets to Gemini and asks the model to classify whether each snippet is truly about the football club in question.
This is used selectively, especially for ambiguous clubs, to keep cost under control while improving relevance quality.
7.3 Fail-open fallback logic
The system is intentionally designed not to collapse coverage when filtering becomes too strict. If evidence volume falls below the minimum threshold after full filtering, it retries with rules only. If evidence is still too thin, it falls back to raw source evidence.
This preserves continuity while still preferring cleaner evidence whenever possible.
8. Reactive score, baseline trend, and public display score
Club Sentiment separates immediate supporter reaction from slower trend behavior.
Reactive score
The reactive score is the day’s raw sentiment output based on that day’s evidence. It is intentionally responsive to recent matches, injuries, transfers, sack pressure, or sudden mood shifts.
Baseline score
The baseline is an exponentially weighted average of prior reactive scores over up to 30 days. Newer days carry more weight than older days, so the baseline can move, but more slowly than the daily reaction layer.
Display score
The public score displayed on the site blends the new reactive score with the prior baseline. This allows a club’s score to respond to real emotion without becoming too unstable from a single evidence burst.
9. Persistence, history, and continuity
Every club-day result is written to storage with the full score record, including the display score, reactive score, baseline score, baseline confidence, reasoning, source counts, dominant themes, and sample snippets.
This historical record serves two purposes. First, it allows Club Sentiment to compute trend baselines over time. Second, it enables users to compare supporter mood across days, matches, and broader club cycles rather than seeing each day in isolation.
10. What the system is designed to do well
- Capture short-term supporter mood after results, transfers, or major club events.
- Preserve direct fan voice through club communities, match-thread comments, and reaction-heavy platforms.
- Reduce noise caused by ambiguous club names and low-quality off-topic snippets.
- Produce a public-facing score that is reactive enough to feel real but stable enough to be historically meaningful.
- Build a longitudinal trend layer that becomes stronger as history accumulates.
11. What the system does not claim
- It is not a betting model.
- It is not a prediction engine.
- It is not a rating of football quality.
- It does not claim to see all supporter discussion everywhere.
- It does not interpret every source identically; each source is used for the kind of football signal it provides best.
12. Why this methodology exists
Football is not only a sport of results. It is also a sport of emotion, expectation, pressure, belief, resentment, unity, and reaction. Those forces shape the public environment around clubs every day.
Club Sentiment exists to measure that emotional environment with as much structure, consistency, and methodological honesty as possible. The score is not meant to replace football analysis. It is meant to add a new layer to it: a disciplined measure of public supporter mood.
Summary