I Use AI All Day at Work. Here’s What That Really Means.

IBM Project Bob

On a recent episode of Hard Fork, Kevin Roose said something that stuck with me: “I want to know how people are actually using AI at work. Because the data that we have is largely self-reports and I think some firms have exaggerated how much they are doing with AI.”

He was interviewing Anton Korinek, an economist at the University of Virginia who studies AI’s impact on labor markets. Korinek’s view is nuanced: the productivity benefits so far have been real but limited, and the job market disruption has been modest. But he’s clear that as AI capabilities keep compounding, that calculus will shift significantly.

I think Roose is right to be skeptical of the self-reported data. Most AI use-at-work stories are highlight reels. They show the output, not the infrastructure. They show the finished content, not the months of work that made it possible to produce that content reliably, at scale, without sacrificing quality.

So here’s my attempt at an honest account. I work on IBM Redbooks at IBM, and over the past several months I’ve been building a structured AI workflow system using IBM Bob to transform how we produce technical documentation. The system is real, it works, and IBM Bob is reaching general availability on March 24. This is what building it actually looked like.

I don’t have magic prompts. I have infrastructure took months of real work before it paid off.

The Problem I’m Actually Trying to Solve

I work on IBM Redbooks—a long-running technical publication series that helps enterprise teams understand and implement complex IBM technologies. Think of them as deeply technical field guides: the kind of documentation that a seasoned architect actually uses when planning a major deployment, not the kind that gets published and immediately shelved.

For decades, that meant monolithic PDFs: on average 138-page documents, painstakingly assembled, that functioned as single artifacts. When one chapter needed updating, you republished the whole book. When the technology moved (and enterprise technology moves constantly) the entire document aged out together.

The format problem was real. But it was just the beginning.

The deeper problem was duplication. The same concepts, the same explanations, the same foundational material got rewritten across dozens of publications by different contributors at different times. There was no single source of truth. There was just accumulated technical debt, compounding quietly over years.

Then there’s the audience problem. Writing for everyone means writing for no one. A beginner system administrator setting up their first environment needs completely different content than a chief sustainability officer trying to understand compliance implications. Traditional publications tried to serve both. They served neither particularly well.

And the people problem: the contributors who know this material best (the engineers, the architects, the deep SMEs) are not documentation professionals. They know IBM Z mainframe architecture the way a master carpenter knows wood grain. But ask them to navigate a style guide and a content taxonomy and a publication framework all at once, and you’ve built a wall between their expertise and the people who need it.

Finally: scale. Our 2026 roadmap calls for hundreds of modular topics across multiple campaigns, each targeting specific audience personas. Do the math. It’s not achievable with traditional workflows. This isn’t a “write faster” problem. It’s a system architecture problem.

I Built a System, Not a Prompt Collection

The most important thing I’ve done isn’t prompting AI better. It’s externalizing my expertise into reusable infrastructure.

This is, incidentally, the same philosophy behind Project Bob. which is reaching GA on March 24. Bob isn’t a smarter autocomplete. It’s a system that understands your intent, your codebase, and your organization’s standards. The insight is the same whether you’re building software or documentation: generic AI asked generic questions produces generic results. The value comes from teaching the system your domain.

Here’s what that looks like in practice for content work.

The Knowledge Base

The first thing I built was a briefing document, not for humans, but for AI. A comprehensive file that explains what our publications are, what they’re for, how they’re structured, what quality looks like, and what the different types of content we produce are trying to accomplish.

Think of it like onboarding a new colleague. You wouldn’t hand them a project and say “figure it out.” You’d spend time explaining the context, the standards, the vocabulary. That’s what this document does, except I wrote it once and every AI session starts from it. The AI isn’t starting from zero each time. It’s starting from a shared understanding of the work.

Here’s an excerpt from the actual briefing document — the section that explains what a Redbooks Experience (RBX) is and why it exists:

# AI Agent Instructions: What is an IBM Redbooks Experience (RBX) and how to create one.

## WHAT IS RBX AND WHY IT EXISTS

### The Problem with Traditional Redbooks
Traditional IBM Redbooks are monolithic PDF publications (typically 138+ pages) that create multiple pain points:
- **Unwieldy**: Hard to read, search, and update; single change requires re-reviewing entire document
- **Locked content**: Information trapped in long PDFs, poor searchability and user experience
- **Redundant**: Same content rewritten across multiple Redbooks (e.g., "What is LPAR?" appears in 10+ publications)
- **Audience mismatch**: "Tome" assumes single reader progressing learn→adopt→deploy, but real users have specific roles and needs
- **Slow updates**: Updating one section forces republishing entire book, slowing delivery cycles

### What is RBX (Redbooks Experience)?
**RBX** is a new modular, topic-based approach where:
- Content is broken into **topics** (self-contained "knowledge blocks" answering one question each)
- Topics are **reusable** across multiple publications
- Topics are **assembled** into curated experiences tailored to specific personas, levels, and use cases
- Delivery is **web-first** with pick-and-choose modules (small PDFs optional)
- Updates affect **only changed topics**, propagating automatically to all RBX that use them

**Key Principle**: Write once, assemble many ways. Replace "publications" with flexible, persona-aware topic collections.

Notice what this is doing: it’s not explaining the framework to a human reader. It’s written for the AI — giving it the vocabulary, the design rationale, and the structural logic it needs to make good decisions independently. Every session starts here.

The Persona Library

We currently write for nine distinct reader types. Each one has a different experience level, different technical vocabulary, different concerns, different patience for abstraction.

Rather than re-explaining “write this for someone who’s new to this technology and worried about making mistakes” every single session, I built a structured library. Each persona is documented: what they know, what they don’t know, what they care about, how they prefer to receive information. When I start a new piece of content, I reference the persona and the AI already knows who we’re writing for.

The library lives in a machine-readable YAML file — structured specifically so the AI can parse and apply it, not just read it:

# IBM Redbooks Persona Library
# Machine-Readable Format for Agentic Workflow
# Version: 1.0
# Date: 2026-02-05
# Source: personas.md

metadata:
  version: "1.0"
  last_updated: "2026-02-05"
  source_file: "personas.md"
  total_personas: 9
  description: "Structured persona definitions for IBM Redbooks content targeting and RBX assembly"

That format distinction matters. A human-readable persona description might say “write for a nervous beginner.” A machine-readable one specifies experience level, technical vocabulary, decision-making authority, risk tolerance, and preferred content depth, all in a format the AI can apply consistently without interpretation.

The payoff isn’t just speed. It’s consistency. Nine personas, hundreds of topics, one coherent voice for each.

The Style Guide

Formatting rules. Tone conventions. How to structure prerequisites. When to use a warning versus a note. How to handle technical terms the first time they appear. Whether to use active or passive voice in procedural steps.

None of this is glamorous. All of it matters. And none of it should live only in my head.

The style guide is a living Markdown document the AI references on every content task:

# Style Guide for IBM Redbooks Learning Content

## Document Information
- **Purpose**: Establish consistent stylistic conventions for IBM Redbooks learning paths and technical documentation
- **Scope**: Applies to all learning modules, technical guides, and educational content
- **Last Updated**: 2026-01-26
- **Status**: Living document - will evolve based on feedback and best practices

---

## Table of Contents
1. [Header and Navigation Conventions](#header-and-navigation-conventions)
2. [Internal Navigation Links](#internal-navigation-links)
3. [Content Structure](#content-structure)
4. [Transitions and Flow](#transitions-and-flow)
5. [Visual Spacing and Hierarchy](#visual-spacing-and-hierarchy)
6. [Metadata and Internal Notes](#metadata-and-internal-notes)
7. [Language and Tone](#language-and-tone)
8. [Formatting Conventions](#formatting-conventions)
9. [To Be Determined (TBD)](#to-be-determined-tbd)

Now I don’t re-explain formatting preferences session to session. I don’t catch the same style violations in review. The standards are in the system, not in my memory.

The Key Insight

Most people treat AI like a magic answer machine: ask a question, get an answer, close the tab. No memory, standards, or continuity. Context engineering works well because the user is giving a prompt + the necessary information to answer correctly. Intent engineering is all about knowing what “done” looks like and giving an AI agent the necessary instructions + context to complete a potentially long-running task.

I treat it like a trained colleague. One who already knows our framework, our audience, our quality bar. One who builds on previous work rather than starting fresh every time. The difference in output quality is not subtle.

Specialized Modes for Specialized Work

A generalist asked to do everything produces mediocre results at everything. It’s like listening to your entire streaming music library - the odds that you are listening to the actual song you want are actually pretty small.

Once I had the foundational infrastructure in place, I built specialized modes—eleven of them—each scoped to a specific stage of the content pipeline. Think of it as an assembly line, except each station is intelligent rather than mechanical.

The Pipeline

An orchestrator mode coordinates complex, multi-stage projects. It breaks work into discrete stages, assigns each stage to the right specialized mode, and ensures handoffs don’t lose context. From there, a scope analyst runs before anyone writes a word. Then drafting. Then review.

Each stage has its own instructions, its own quality standards, its own definition of “done.”

Here’s what one of those mode definitions actually looks like — the RBX Content Assistant, which handles conversion of legacy content into modular topics:

customModes:
- slug: rbx-assistant
  name: RBX Content Assistant
  roleDefinition: 'You are an expert IBM Redbooks content architect specializing in modular,
    topic-based authoring. Your expertise includes:

    - Converting traditional Redbooks (monolithic PDFs) into atomic topics (200-1000 word knowledge blocks)

    - Applying persona-driven metadata and audience targeting

    - Assembling topics into RBX (Redbooks Experience) publications

    - Enforcing RBX style guidelines and content standards

    - Identifying content reuse opportunities across multiple publications

    Your personality is:

    - Systematic and detail-oriented in content analysis

    - Collaborative and clear in explaining conversion decisions

    - Pragmatic about balancing ideal structure with practical constraints

    - Proactive in identifying reuse opportunities and quality issues

    You help content creators transform legacy documentation into flexible, user-centered learning experiences.'
  whenToUse: 'Use this assistant when you need to:

    ✅ Convert traditional Redbooks chapters into modular RBX topics

    ✅ Create new topic-based technical content from scratch

    ✅ Validate topic structure and metadata completeness

    ✅ Assemble topics into persona-specific RBX publications

    ✅ Ensure content follows RBX style guidelines

    ✅ Identify content reuse opportunities across products/audiences

    ✅ Decompose complex procedures into atomic knowledge blocks

    ✅ Map existing content to target personas and learning stages

    Don''t use for:

    ❌ General technical writing unrelated to IBM Redbooks

    ❌ Non-technical content conversion

    ❌ Product feature design or architecture decisions'

Notice the specificity — not just what the mode does, but when to use it and when not to. The boundaries matter as much as the capabilities. A generalist AI asked to do everything produces mediocre results at everything. These mode definitions are the solution to that.

The Proposal Vibe Check

This one surprised came about from a weekly call I have with colleagues about AI usage in design.

Here’s the problem it solves: AI can produce polished-looking proposals in minutes. So can a person who hasn’t done real thinking yet. The surface of a proposal (the formatting, the confident language, the structured sections) no longer tells you whether the underlying thinking is solid.

Here’s the design principle baked into the mode definition itself:

<vibe_check_workflow>
    <mode_overview>
        The Proposal Vibe Check mode evaluates internal proposals, concept notes, and project
        pitches against four sequential evaluation layers. It distinguishes between proposals
        where deep thinking has occurred and proposals where AI has produced something that
        looks like thinking but hasn't been validated, stress-tested, or grounded in reality.

        This mode is a depth gauge, not a gatekeeper. Assessments read as constructive coaching.
    </mode_overview>

    <design_principle>
        This mode is not a gatekeeper. It is a depth gauge. The goal is to make the real cost
        visible alongside the polished pitch — not to kill ideas, but to ensure they are built
        on solid ground. Every assessment must read as constructive coaching:
        "Here is what would make this stronger" — not criticism.
    </design_principle>
</vibe_check_workflow>

That framing (depth gauge, not gatekeeper) is the entire point. The mode evaluates proposals across ten dimensions of thinking. One dimension, as an example: strategic context. A surface-level proposal says “we’ll create content about sustainability reporting.” A deeply-considered proposal says “CFOs are being asked to report Scope 3 emissions for the first time, and they’re coming to their IT teams for data they’ve never had to collect before. Our content needs to bridge the gap between compliance requirements and the infrastructure changes needed to generate that data.”

The same topic, but a completely different level of understanding. The vibe check surfaces which one you actually have and shows the author exactly where their own thinking needs more work. It doesn’t replace judgment, but it does make that judgment more systematic.

AI as Team Onboarding Tool

The work that surprised me most: using AI to lower the barrier for my human teammates.

Getting Subject Matter Experts Contributing

The experts who know this material best have spent careers going deep on technical problems, not on documentation frameworks. Traditional onboarding required weeks: style guides, taxonomy training, workflow orientation. The result was that many SMEs simply didn’t contribute: the barrier was too high relative to their available time.

I used AI to build a fast-track onboarding path: a document that translates documentation jargon into plain language, explains our framework in terms that make sense to someone coming from a purely technical background, and gets a new contributor to their first draft in a single sitting rather than three training sessions.

The AI didn’t helps me work faster and it is helping more people contribute.

GitHub for Non-Engineers

Our content team works in version control now, which is a significant cultural shift for people whose previous workflow involved emailing local documents back and forth.

I built a beginner’s guide to Git framed entirely around concepts the team already understood. A repository is a shared filing cabinet. A commit is saving your work with a note about what you changed. A branch is a draft that hasn’t been approved yet.

AI helped me do that translation work at scale - not just writing the words, but pressure-testing the analogies, catching the places where the explanation assumed knowledge the reader didn’t have.

The Unglamorous Infrastructure Work

Nobody talks about this part. It’s where AI delivers some of its most concrete value.

Document Conversion at Scale

Two decades of technical publications exist as PDFs. That’s a lot of institutional knowledge locked in a format that’s difficult to search, impossible to reuse, and labor-intensive to update.

Manual conversion would take months. Using IBM’s Docling tool, I built a conversion pipeline that batch (processes these documents into Markdown) preserving tables, handling diagrams, maintaining structure. Intelligent caching means we don’t reprocess documents unnecessarily.

The output isn’t glamorous, but it unlocks everything that comes next. You can’t build a modern content system on top of inaccessible legacy knowledge.

Systematic Quality Review

Four hundred modular topics. Manual review of each one for technical accuracy, style compliance, accessibility, and editorial quality is not feasible at that scale.

A reviewer mode runs multi-dimensional validation before anything publishes. It checks for the things that would be too tedious to catch consistently by hand and enforces the standards that exist in the style guide rather than relying on reviewers to remember them.

Version Control as Discipline

Every significant change to the system gets tracked with semantic versioning and a changelog entry. AI helps maintain this discipline, not by generating content, but by supporting the infrastructure that makes content trustworthy over time. As soon as a change is made to our system, Bob updates the README and the changelog.

What I’ve Actually Learned

The investment is front-loaded.

Building the knowledge base, the persona library, the style guide, the mode definitions took weeks of real work before the first project got underwway. Every subsequent session starts from a dramatically higher baseline, and the payoff compounds. But if you’re not willing to make that initial investment, you’ll keep getting mediocre results and wondering why everyone else’s AI seems to work better than yours.

Specificity is everything.

Vague instructions produce vague output. The reason the vibe check works is that it defines exactly what “strategic depth” means - not as a feeling, but as ten concrete dimensions with clear criteria for what addressed looks like versus what surface-level looks like. The more precisely you can define quality, the more reliably AI can achieve it.

AI is a collaborator, not a replacement.

AI has context limits. Sessions end. Work has to be structured for handoffs. These are the same skills you’d need managing any collaborator: clear instructions, defined checkpoints, validation at each stage. I’ve come to think of it less as “AI-assisted work” and more as “human-assisted AI”: I’m the one who knows what good looks like. The AI is the one with infinite patience for executing against that standard.

Evaluation is the new skill.

In a world where AI produces polished-looking work instantly, generation is no longer the bottleneck. The critical skill is knowing whether something is actually good versus whether it just looks good. This is why the vibe check exists. The shift isn’t from “can you write?” to “can you prompt?” It’s from “can you write?” to “can you judge?” That judgment is the core human contribution now.

Your teammates need this too.

A significant portion of my AI work over the past months has been helping other people work more effectively: the SME onboarding path, the GitHub tutorial, the repeatable workflows. AI’s value isn’t just individual productivity. It’s team capability expansion. The best use of AI is often helping other people deploy their expertise more effectively.

What didn’t work: Early attempts at generating full-length drafts failed consistently. The output looked plausible and read poorly. The lesson was that generation only works when scope, audience, and structure are already settled—which is why the scope analyst mode runs first now. The modes that were too complex to maintain also quietly died. Simplicity in system design turns out to matter as much in AI infrastructure as anywhere else.

So When Someone Asks How I Use AI at Work

Here’s my answer to Kevin Roose’s question: I’ve spent months building a system that lets me use AI well. What you’re seeing in the self-reported data. The “AI is transforming everything” claims that he’s rightly skeptical of often reflects ambition, not infrastructure. Real transformation is slower, less glamorous, and more front-loaded than the highlight reels suggest.

Anton Korinek is probably right that the current productivity gains are real but bounded, and that the bigger disruptions are still ahead. What I’d add: the gap between organizations that have built structured AI systems and those still experimenting with ad-hoc prompting is going to widen considerably as capabilities compound. The work of building infrastructure matters now, not just because it helps today, but because it positions you to capture the gains that are coming.

IBM Bob reaching general availability on March 24 is one data point in that story. An enterprise-grade, agentic AI system built with security-first principles, Anthropic’s models under the hood, and the explicit goal of understanding your intent and your standards—not just autocompleting your next line. That’s what structured AI looks like at scale.

For me, at the level of a content team rather than a development organization, the principles are the same: externalize your expertise, build infrastructure, enforce quality systematically.

That’s what AI at work actually looks like. Your domain will be different, your standards will be different, your audience will be different. But if you’re relying on ad-hoc prompting to get consistent, high-quality results, you’re not getting the real thing yet.

Have questions about structured AI workflows for content or documentation teams? I’d genuinely like to hear how others are approaching this—especially in technical publishing and knowledge management. Drop a comment or reach out directly.

A note on process: I developed this post in conversation with Claude, Anthropic’s AI assistant. The ideas are mine, but Claude helped me connect them, pressure-test the argument, and draft the prose. The session itself was a back-and-forth: I’d push back, ask for revisions, add my own examples, and Claude would adapt. Less like querying a search engine, more like working with a digital colleague. There’s something fitting about using an AI tool to write about the disruption AI is causing. I stayed in the driver’s seat. This is me, speaking for my work.