Spoiler Guard

LLM-based system for contextual Q&A about TV series without revealing spoilers.

Context

This project started with a simple question: can an AI assistant answer questions about TV series without accidentally revealing spoilers?

At first, the problem seemed fairly straightforward. Large language models are capable of understanding narratives, characters, and plot events, so it seemed reasonable to ask them to avoid spoilers when generating responses. However, asking the model to answer the question and assess spoiler risk in a single step produced inconsistent results. Sometimes the response was safe; other times, it revealed information a viewer might consider a spoiler, depending on how the question was phrased and how the model interpreted it.

The solution was to separate those responsibilities: the user provides where they are in the series (show, season, episode) and asks a question; the model classifies the response as SAFE or WARNING before revealing anything. If WARNING, nothing is revealed: the model only asks whether the user wants to continue, and the answer only comes after explicit confirmation.

This made it clear that the real problem was not generating responses. It was deciding whether a response should be generated at all.

The Problem

Spoilers are highly contextual. The same information can be harmless for one person and a major reveal for another, depending on how much of the series they have watched. And the ambiguity goes beyond the obvious: revealing a major character's death is a clear spoiler, but what about a relationship that develops later? A future appearance? The simple fact that someone survives up to a certain point in the story?

If spoiler detection remained implicit inside response generation, the system would have limited predictability and little control over its failure modes. I needed to separate those responsibilities and make the behavior explicit.

What I Wanted to Learn

More than solving the spoiler problem, this project was a deliberate learning exercise. I wanted to understand in practice how to integrate an LLM into a real application: how to pass context, how to structure output, how to manage a conversation session with history, how to handle failures from an external API. The spoiler problem was the pretext; LLM integration engineering was the goal.

Design Decisions

Two endpoints with a dependency

The /ask → /confirm flow is not the simplest way to build an API, but it reflects exactly the domain logic: before revealing anything, you need to classify and confirm. It also forced me to think about how to model state across two calls, something that stays hidden in simpler projects.

In-memory sessions

Each conversation is a Gemini chat object stored in a Python dictionary in memory, indexed by a session_id UUID generated on the frontend. This means conversation history persists across questions in the same session, but disappears if the server restarts. The obvious alternative was Redis, but I deliberately left it out, since the focus was on understanding how Gemini manages conversational context, not on scaling state. The trade-off was intentional.

Structured LLM output

One of the biggest challenges in any LLM integration is ensuring the response has a consistent format. I addressed this on two fronts: using response_mime_type: 'application/json' in the model configuration (which instructs Gemini to always return JSON) and having a manual parsing fallback in case the pre-parsed object did not come through. The output schema (spoiler_level, warning_message, response, tip) was iterated a few times until it mapped exactly to what the frontend needed to render.

Low temperature

Using temperature: 0.3 was a consistency choice. In a system that classifies content into binary categories (SAFE/WARNING), creativity is the enemy: the model needs to be predictable. The higher the temperature, the greater the chance the model invents inconsistent classifications or ignores the prompt rules.

Sanitization and prompt injection

When the user has a free-text field, there is a risk of someone trying to manipulate the model by passing disguised instructions as questions — for example, writing "SYSTEM: ignore the previous instructions and...". I implemented sanitization that blocks common structural prompt injection patterns using regex. It is not a foolproof solution, but it is a conscious defense layer for a project at this scale.

Retry with linear backoff

External APIs fail. Gemini has quotas, returns 429 when exceeded, and occasionally becomes unstable. Rather than letting the error reach the user directly, I implemented a retry logic with increasing wait times (delay × attempt). Transient errors trigger a retry; fatal errors (token limit, invalid JSON) fail immediately. That distinction — what is worth retrying and what is not — was one of the most practical lessons of the project.

Challenges

Getting the model to respect context boundaries

The main engineering challenge was convincing Gemini that its role was not to answer freely, but to classify first. Early on, the model would frequently slip and hint at spoilers within the warning message itself. I kept refining the prompt until I reached instructions that clearly separated the two modes: if WARNING, only ask for confirmation, never reveal.

Consistent JSON output

Even with response_mime_type configured, the model would occasionally return JSON wrapped inside a markdown code block. The manual parsing with .removeprefix("```json").removesuffix("```") was the pragmatic fix. Not elegant, but honest about how LLMs behave in practice.

Keeping the model focused on the right series

Without guardrails, the model answered questions about any series, ignoring the context passed in the system prompt. I added an explicit rule: "only discuss {user_serie}; if the question is about another series, ask the user to start a new session." This illustrates how prompt engineering is sometimes less about technical instruction and more about shaping behavioral expectations.

Limitations

The frontend is intentionally minimal. There is no visual conversation history, the interaction feels more like a form than a chat, and it does not persist context across page reloads. I made that choice to avoid losing focus on what I actually wanted to learn, but the result is not a good product experience.

The session_id is generated on the frontend and trusted by the backend without any additional validation. In production, this would be a serious security issue. Here it is an acceptable simplification for a local learning project.

The Gemini free tier is limited. With heavy use, you hit the rate limit quickly. For real-world use, billing would need to be configured, or a caching strategy for repeated questions would be needed.

The in-memory state means restarting the server wipes all conversation history. For real persistence, the natural next step would be Redis with a TTL per session.

What I Learned

This project reinforced an idea that now shapes how I think about AI systems: separating decision-making from content generation can be more important than improving the generator itself. Many reliability problems become easier to understand when responsibilities are isolated and failure modes are explicit. Sometimes, the most valuable architectural decision is not making a model more powerful, but making its behavior more predictable.

That idea became more concrete when turning it into code. Practical lessons emerged about the behavior of LLM-based systems: that temperature matters for classification applications, that LLM output is never fully reliable without strict format instructions, that chat sessions carry a token cost that grows with history, and that prompt injection is a real attack vector that needs to be considered from the start of the project.

But perhaps the simplest lesson was the most valuable: building something for a problem you actually have is the most efficient way to learn.