Model Citizen — Episode 4
Spotify | Apple Podcasts | Youtube
Host Glenn Parham takes listeners inside Task Force Lima, the Pentagon’s two-year sprint to bring generative AI into the largest organization in the world. From ChatGPT’s first shockwave in DC to the political and technical battles over model authorizations, Glenn explains what it took to ship working prototypes, bring frontier models into government environments, and publish DoD-specific AI benchmarks. He also lays out his concept for AI Government Alignment—ensuring that government users get the compute, access, and permissions they need without running into the dreaded “As an AI model, I cannot…” wall.
Question | Glenn’s punch-line takeaway |
---|---|
Why was Task Force Lima created? | ChatGPT’s debut exposed a vacuum: no DoD guidance, no authorizations, no benchmarks. |
What was the hardest technical barrier? | Authorizing “unbounded” LLMs across air-gapped networks with wildly different risk profiles. |
Why push DoD-specific benchmarks? | Without mission-grounded evals, nothing beyond admin work will ever get authorized. |
What shocked you most in the benchmark results? | Chinese open-weight models beating U.S. baselines on U.S. military logistics. |
Time | Chapter |
---|---|
00:00 | Cold-open: keynote at Joint AI for Energetics Conference |
01:10 | What is Task Force Lima and why it existed |
05:40 | First guidance: “Don’t put TSCI into ChatGPT” |
09:00 | The two authorization paths: open-weight vs closed-weight |
17:30 | Experiments: Bravo hackathons, radio-linked AI agents, Combi robot |
22:50 | Axon LLM-Ops framework and fine-tuning DoD policy models |
28:20 | Compute bottlenecks and market-rate chaos ($10K vs $5M chatbots) |
33:10 | Building the DoD AI community: LLM Office Hours |
37:00 | Why benchmarks became non-negotiable |
42:00 | Founding GovBench & the Joint Staff Bench methodology |
48:40 | Shocking result: Chinese models outperforming U.S. on logistics |
53:20 | Roadmap: V2 benchmarks, inter-agency expansion |
57:00 | Audience Q&A: guardrails, permissions, and operational use cases |
01:04:00 | Closing thoughts & where to learn more |