Storyland: A Bedtime Story Generator

Role: Project lead | Status: Complete

Github Demo

OVERVIEW

Storyland is an AI-driven storytelling framework designed to craft, evaluate, and refine bedtime stories for children aged 5–10. The system orchestrates multiple specialized AI roles that collaborate toward a shared creative goal. Each role performs a distinct reasoning function (understanding, planning, storytelling, judging, and refining) allowing the model to operate with structured decision-making rather than a single one-shot prompt. (Use the Storyland Web App)

Storyland supports both Autonomous and Human-in-the-Loop storytelling:

  1. Autonomous Mode: The AI independently generates, evaluates, and refines stories until they meet its internal quality criteria.
  2. Human-in-the-Loop Mode: The user becomes an active co-creator — able to read the first draft, provide feedback, and guide the AI to improve tone, pacing, or themes. This interaction mimics the editorial loop between a writer and an editor, bringing a layer of human creativity and intent into the refinement process.

MOTIVATION

As a Ph.D. candidate deeply involved in storytelling research and narrative data, this project resonates with my personal and academic interests. I’ve spent years studying how stories evoke emotion, structure meaning, and adapt to different audiences, and this assignment felt like a natural extension of that passion. Therefore, I designed Storyland not as just another AI storytelling demo, but as an experience that blends AI design thinking with practical software engineering to explore how computational systems can mimic human creativity through structured reasoning.

Core Design Philosophy

My design choices for Storyland were guided by both my background as a Ph.D. student working with storytelling methods and data, and by my belief that good AI systems should reflect both creative reasoning and engineering discipline. Rather than building a single end-to-end prompt, I approached the design as a creative collaboration between specialized reasoning agents, each performing a defined cognitive task. This mirrors how human authors, editors, and reviewers work together in real storytelling contexts.

Key Design Decisions and Rationale

  1. Decomposition over Monoliths: Breaking the storytelling process into smaller, well-defined components (Classifier, Planner, Storyteller, Judge, Refiner) allowed greater interpretability and control. Each component focuses on one part of the creative process — a deliberate move to make the system’s reasoning transparent and debuggable rather than a black box.
  2. Structured Prompting with JSON: Instead of free-form natural language chaining, I used JSON-based responses (chat_strict_json) to ensure deterministic outputs and stable communication between agents. This enforces a reproducible workflow — essential for academic rigor and evaluation, especially in narrative research.
  3. Automated Evaluation Loop (Judge + Refiner): In creative generation, subjective quality varies widely. To mitigate that, I introduced a self-critique mechanism where the Judge scores the story across specific dimensions (clarity, kindness, engagement, vocabulary, arc, age-fit), and the Refiner improves the text based on feedback. This iterative refinement loop mirrors editorial review cycles in human storytelling (i.e., embedding a feedback-driven quality control system).
  4. Human-in-the-Loop Integration: Beyond autonomous refinement, I implemented a Human-in-the-Loop mode that invites users to actively participate in story improvement. After reading the first AI-generated draft, users can provide targeted feedback. This feedback is then reintroduced into the refinement loop, guiding the AI to adjust tone, pacing, or style. It transforms Storyland from a one-way generator into an interactive storytelling partner, illustrating how user collaboration can enhance creativity and control.
  5. Ethical and Safe Content Design: Since the audience is children, I integrated a pre-processing layer that sanitizes unsafe or violent input terms. This ensures emotional safety and aligns with responsible AI storytelling principles, something I study closely in my academic work.
  6. Separation of Prompts via YAML: All prompt templates are stored in prompts.yaml to decouple model logic from language design. This supports iterative experimentation and prompt versioning, making it easier to fine-tune tone, vocabulary, and evaluation rubrics.
  7. Streamlit Web Interface: The decision to add a Streamlit interface wasn’t just about accessibility, it was about transparency and engagement. A graphical interface allows users to interact with the AI pipeline, see its reasoning stages, and explore the refinement loop visually. It bridges research and usability, demonstrating that explainable AI design can also be approachable and interactive.
  8. GitHub + Streamlit Hosting: Hosting the project publicly reflects my commitment to open, reproducible research. It allows others to experience the model directly, clone the repository, or inspect the YAML-based prompt logic — an essential aspect of transparent AI development.

How It Works

When a user provides a story idea, the Classifier interprets intent and extracts structured context such as theme, tone, and age suitability. The Planner transforms this context into a coherent narrative outline, while the Storyteller expands it into a complete story using creative and age-appropriate language. Next, the Judge analyzes the story across multiple criteria (e.g., clarity, kindness, engagement, narrative arc, and vocabulary) returning both a numerical assessment and a critique. If any score falls below a quality threshold, the Refiner automatically rewrites the story based on that feedback, creating a closed feedback loop that enables self-correction and iterative improvement. This architecture demonstrates autonomous task execution, role-based coordination, and continuous refinement through feedback.

Autonomous Architecture (Fully AI-Driven)

User 
  │
  ▼
[Classifier] ──► [Planner] ──► [Storyteller] ──► [Judge] ──► [Refiner] ──► [Output]
                                      ▲                               │
                                      └─────────── Feedback Loop ─────┘

Human-in-the-Loop Architecture

User 
│
▼
[Classifier] ──► [Planner] ──► [Storyteller] ──► [Judge] ──► [Refiner] ──► [Output] ──► [Human Feedback]
                                   ▲                                │
                                   │                                │
                                   └────────── Feedback Loop ───────┘
                                ▲                                                        |
                                │                                                        |
                                └────────────── Human-In-Loop Feedback Loop──────────────┘

Pipeline Overview

User → Classifier → Planner → Storyteller → Judge ↺ Refiner → Output

Component Purpose Key Techniques
Classifier Understands the user’s prompt (e.g., “a little girl and a magical apple”) and extracts metadata like age, category, length, and moral values. Prompt-driven JSON extraction via chat_strict_json()
Planner Creates a structured story outline (title, 5-bullet plot arc, reading-level hints). Controlled JSON generation
Storyteller Expands the outline into a fully written story in an age-appropriate tone. Temperature-tuned creative generation
Judge Evaluates the story using a rubric: clarity, kindness, engagement, vocabulary, arc, and age fit — and returns scores with critique. Deterministic evaluation via response_format={"type": "json_object"}
Refiner Revises the story based on the judge’s critique until it meets the target quality score. Iterative self-improvement loop

Future Vision

Current Storyland Includes

  • Modular multi-agent architecture (Classifier, Planner, Storyteller, Judge, Refiner)
  • Human-in-the-Loop co-creation mode for guided refinement
  • YAML-based prompting framework for maintainable and interpretable AI behavior
  • Iterative quality refinement loop via self-critique and re-generation
  • Story safety filtering to ensure age-appropriate, non-violent narratives
  • Streamlit-based web interface for intuitive, interactive storytelling
  • GitHub-hosted open demo for reproducibility and transparency

Possible Future Additions

  • Cultural dataset integration (global fairy tales and folklore for culturally grounded stories)
  • Multilingual story generation and local idiom adaptation
  • Voice narration and AI illustration support for immersive story experiences
  • User feedback and reinforcement-based learning loops
  • Emotional tone control (e.g., calming, curious, joyful modes)
  • Cloud API for educators and interactive learning tools
  • Research pipeline for analyzing narrative structure and empathy development

IMPACT

Storyland demonstrates how AI can complement human creativity in educational and recreational contexts. It empowers children with engaging and safe storytelling experiences while allowing parents, teachers, and researchers to participate in narrative co-creation. By combining structured reasoning with human-in-the-loop feedback, the project contributes to the development of more responsible, transparent, and emotionally intelligent AI systems. Additionally, it serves as a research platform for studying narrative design, empathy development, and computational creativity.