alt

Introducing agentic news

August 6, 2025

By Cory Ondrejka, Chief Technology Officer, SmartNews

SmartNews does not build or train Large Language Models (LLMs) on publisher data, nor are we using data from our existing apps in NewsArc. But NewsArc relies on AI. So what are we doing?

What’s happening in the world right now

News is an incredibly vibrant and constantly changing medium. Being new is its signal characteristic. Thanks to our publisher partners, we have access to far more – and far broader – information than anyone could possibly process in a day. Understanding all of it is a deep technical challenge, but it allows us to create a view of what’s happening in the world right now.

This was impossible before Large Language Models (LLMs). News events are constantly changing and evolving, publishers compete with each other for attention, and different types of news are covered in wildly varying ways. We originally started from a relatively conventional position of mixing ML classifiers, clustering techniques, and ranking with a sprinkling of LLMs to discover and generate signals too complex for conventional Natural Language Processing (NLP). LLMs were expensive, so we worked around them.

We also considered whether model training would help in solving the problem. From our research and discussions with publisher partners, it became clear to us that training models with news content – whether foundation model development, pre-training, or fine-tuning – was a terrible idea. Since training biases the model towards the highest volume editorial and style materials you give it, we'd generate a model built to imitate our partners, irrevocably tied to their data. Worse, training tended to shift our thinking towards the kind of features partner-trained models do trivially – summaries, bullets – which didn't match our vision for NewsArc.

It became one of SmartNews's foundations: we wouldn't train on partner content. It was a perfectly timed decision.

The electric car moment

Friends in the automotive world talk about how different an electric car is compared to a hybrid. When you don't have to store gasoline or manage separate 12V systems, everything around packaging, performance, and efficiency improves. It’s the same with LLMs. As the foundation model leaders compete on price, LLM capabilities become radically more available. Even our most optimistic pricing estimates were too expensive by an order of magnitude.

Suddenly, we were able to ponder the question: what if the entire understanding pipeline was just using LLMs? No conventional ML or clustering at all.

We had already been simplifying our stack to accelerate development and, like removing the internal combustion engine, ditching all the non-LLM pieces made for a radically simpler system. Suddenly, we had processing pipelines that could shift from journalists on our team talking to ranking engineers about a cluster to those journalists just editing and testing prompts on their own.

Quality and context

Building ranking, prioritization, and delivery systems that are LLMs “all the way down” meant thinking about context at every step of the process. 

Consider an emerging news event like the massive Kamchatka earthquake in the last week of July. From the first moments of coverage as it was detected, through updates about the magnitude, tsunami warnings throughout the Pacific, all the way to the tsunamis' arrivals, damage, and follow-ups. For some of these, context is straightforward. Error detection and initial scoring operate at the article level. All LLMs can do this well – part of why we see so many bullet and synopsis experiences. Entity recognition works at this level as well. Thinking about it at the event level is where it becomes more interesting.

Let me give examples of just two steps of our process that help us understand events:  deduplication and context management.

First, deduplication. There is a surprising amount of duplication in news reporting. In some ways, it's similar to the shockwaves and tsunamis from the earthquake, moving across the world from the first sources to report it, with spikes and amplification from the largest publishers and their morning editorial meetings. To a large degree this is a feature, not a bug. Duplication and review by fellow journalists is how stories get refined, tested, and confirmed over time. But for a reader, receiving two stories that effectively tell you the same information isn't respecting your attention. So, an important step is both literal and semantic deduplication. Every ML and AI engineer thinks this step will be easy. It isn't. Despite LLM advances in context windows and needle-in-haystack awareness, they are still surprisingly sensitive to token order and repetition in the context. Not ideal, but solvable.

Next is determining how – or if – the story fits into an existing news event. Context management here is key, particularly as stories ebb and flow over the course of hours, days, or even weeks. You have to be able to create knowledge around the existence of a story, what a story feels like to a reader, and how strongly to hold onto it. This knowledge is what allows us to mute news events in NewsArc. You're not just telling the system you want to hide a particular article or publisher, but that you really just don't want to keep reading about a particular story. We'll do our best to respect that across multiple days, even as the story evolves – it's the kind of technical challenge NLP always seemed capable of handling but can't.

These are just two steps in how we build a real-time, evolving view of the world and what news is happening in it. All of it scored and ranked so every interaction in NewsArc – from the Front Page to your own custom Collection – respects your attention and intent because we start with a cleaner, more well-understood picture of the news.

LLMs and Agents

While AI Agents are the buzziest of buzzwords in the middle of 2025, they're a fairly simple concept. Agents are when you give AIs the power to act on your behalf, to explore and come back to you with recommendations, solutions, or new questions. Agents are the mechanism for truly leveraging the power of LLMs to find unexpected connections, those serendipitous moments of delight. Thanks to the Arc platform – our foundational idea of what is happening in the world – we have a very high-quality, very structured view for our agents to explore. Daily Dozen, Collections, and Variety all use agents to refine our scoring, ranking, and connections, to ensure every article we deliver to you is the best way to get the whole story.

This is the start of what agentic news experiences can be. We can't wait to explore it with you!