Analysis: AI assistants inconsistent on answering streaming availability queries

June 2, 2026

A controlled accuracy analysis of streaming title availability data found that ChatGPT scored 43.76 per cent and Claude scored 50.21 per cent when tested against manually verified ground truth across 100 popular US titles, compared to 96.89 per cent accuracy from Reelgood, the streaming data and metadata platform. The analysis, conducted by Reelgood on March 5th, tested each source against the same set of 50 movies and 50 TV shows using identical queries.

The findings arrive as AI assistants are increasingly used for content discovery and recommendation. Both OpenAI and Anthropic have expanded their platforms into media and entertainment partnerships, where accurate ‘where to watch’ data is a baseline requirement for product integrations. When an AI assistant tells a user a title is available on a service where it is not, or fails to list services where it is available, the downstream effects include user frustration, wasted clicks, and erosion of trust in the platform.

Why LLM-generated title availability data is unreliable

Large language models weren’t built to track real-time catalogue changes. The training data and retrieval pipelines they draw from were built for a different purpose, and the result is a predictable set of errors when they’re asked to report what’s streaming where.

Reelgood’s analysis identified six distinct error categories that account for the majority of inaccuracies in both ChatGPT’s and Claude’s responses. These are not random mistakes. They reflect structural gaps in how large language models handle streaming availability data.

Six Systematic Error Patterns

Stale Availability. Models confidently report titles as currently streaming on services they’ve already left. The cause is structural: entertainment press covers new additions to a catalogue extensively but rarely follows up when a title quietly leaves weeks or months later. The training corpus skews heavily toward those announcements, so the model treats outdated positives as current. This is the most pervasive error pattern observed.

Add-On and Bundle Confusion. Models frequently treat titles available through paid add-on channels (such as Starz or Paramount+ on Prime Video) as if they were part of the parent service’s base subscription. Users are told a title is streaming “on Prime Video” when accessing it actually requires a separate Starz or Paramount+ add-on inside Prime Video, creating the false impression that their existing subscriptions cover it.

Long-Tail Service Gaps. Free and ad-supported services like Tubi, Pluto TV, Fawesome, Hoopla,and Kanopy are consistently omitted, even when they’re valid sources for a given title.

SVoD/TVoD Conflation. Models sometimes list a service as a subscription (SVoD) option when the title is only available there for rent or purchase, misleading users about what their existing subscriptions actually cover.

TVoD Blindness. Both models almost entirely omit transactional VoD(rent/buy) options from services like Apple TV and Amazon, affecting the majority of titles tested.

Title Disambiguation Failures. When multiple versions of a title exist (such as One Piece [pictured], which has both an anime series and a live-action Netflix adaptation), models conflate availability across different versions.

Analysis: AI assistants inconsistent on answering streaming availability queries

Latest News

Login / Register