The Chrome extension that pulled me into GenAI
The idea started in late 2024, right after I came back from a sabbatical. I returned to a pile of context: meeting recordings I had not watched, documents I had not read, notes scattered across internal systems, and an upcoming architecture discussion where I needed to be useful quickly.
The obvious answer was to use AI. The practical answer was not obvious at all. Getting the right transcript into the right tool was clumsy. Some of the material was sensitive. Some of the rules around which information could go into which AI system were still unclear. Open source models felt like the safer first place to experiment.
So I built a very basic Chrome extension. The first version did one thing: read the page I was on and produce a one-page summary. It was not a platform. It was not polished. It was a tiny tool sitting next to the work, trying to remove the most annoying step between a long document and a useful understanding of it.
The first product decision was location
I could have built a separate web app where people pasted content into a text box. That would have been easier technically, but it would have asked users to change how they worked. The documents were already in the browser. The meetings were already in the browser. The questions appeared while reading, not after leaving the page.
A browser side panel was the product surface that made sense. It kept the assistant close to the source material. It made summarization feel like an extension of reading rather than a separate workflow. That sounds small, but it changed how people reacted to the tool.
I started with Confluence pages. Then I tried team meetings. Getting meeting transcripts was not straightforward at the time, so I watched the network tab, inspected the API calls, and found where the transcript data was coming from. That was the first moment the extension moved from a toy into something that solved a real daily problem.
People wanted the meeting version
The reaction was immediate. Everyone has meetings they wish they could absorb faster. Once people saw a summary appear next to a recording or a long document, the next requests came naturally: summarize more pages, summarize multiple tabs, answer questions across open tabs, help me prepare before a discussion, help me catch up after being away.
That changed the way I thought about the project. I was no longer building a feature because it was interesting. I was watching a pattern of pain repeat across users. People did not want "AI" in the abstract. They wanted a faster path from scattered context to a confident next action.
The product was not summarization. The product was catching up fast enough to participate in the next conversation.
That framing helped me prioritize. A better prompt mattered, but not as much as reducing the number of steps. A sharper summary mattered, but not as much as letting someone ask, "What changed while I was out?" or "What should I know before this architecture review?"
Summary of summaries only goes so far
Multi-tab summarization was the next obvious feature. At first, I used a simple summary-of-summaries approach. Summarize each tab, combine the summaries, then ask questions over that reduced context.
For high-level recap, it was fine. For precise question answering, it was not. Important details disappeared during compression. A summary might preserve the conclusion but drop the caveat, the owner, or the exact tradeoff that made the answer useful. That taught me a lesson I still come back to: summarization and retrieval are different jobs.
The better answer was RAG across the open tabs. But I wanted to explore a cheaper and more private version before reaching for a heavier backend. So I built a simple in-memory retrieval flow. The extension chunked page content, generated embeddings with a small open source model, stored the vectors locally for the session, and retrieved the most relevant chunks when the user asked a question.
It was limited. Browser memory is real. Model size matters. Embedding dimensions matter. Latency matters. I had to pick a small model and accept that the retrieval quality would not match a production system. But the constraint was the point: if this was going to live inside a browser extension, the architecture had to respect the environment.
Privacy became a feature, not a disclaimer
The more people tried the extension, the more privacy came up. Some teams were comfortable using approved hosted models. Other teams wanted stronger isolation for sensitive documents. That pushed the design in two directions: a productionized path for scale, and a local in-memory path for cases where users wanted the content to stay closer to the browser.
Transformers.js and small local models made that possible enough to be useful. Not perfect. Not magical. Useful. A user could summarize or ask questions against sensitive material with a tighter data boundary, and the product could explain the tradeoff plainly: local processing gives you more control, but you pay for it in capability, speed, and memory.
That is where engineering and product thinking had to meet. A feature is not only what it can do on a happy path. It is also what it refuses to do, what it keeps local, what it sends out, and how clearly it tells the user the difference.
The experiment changed my path
I did not expect the extension to change my role. But the internal interest kept growing. The tool opened conversations with people across the company, from engineers who wanted to use it every day to senior leaders who saw the larger opportunity. It eventually helped me move into the AI team, where there were more resources to turn the idea into something production-ready and scalable.
That shift mattered. The early prototype proved demand. The production version had to answer harder questions: how to scale beyond one user's browser, how to respect data boundaries, how to support different model choices, how to give teams confidence, and how to make the experience reliable enough for company-wide use.
Around the same period, the broader market started moving in the same direction. AI assistants were getting closer to the browser. Side panels, page-aware assistants, meeting summaries, and browser-native workflows started to feel less like experiments and more like an inevitable product category. It was satisfying to see the pattern show up elsewhere, because the user need had been obvious from the first demo: people want AI where their context already lives.
What I learned
- Start with the user trying to get somewhere. The real job was not summarizing text. It was helping someone rejoin a conversation with enough context to make a good decision.
- Put the tool in the workflow. A side panel worked because it reduced movement. The browser was already the workspace.
- Do not confuse compression with knowledge. Summary-of-summaries can help with recaps, but question answering needs access to the right source chunks.
- Respect the runtime. Browser extensions live with memory, latency, permission, and UX constraints. The architecture has to fit the surface.
- Make privacy a product choice. Users should understand when content is local, when it is sent to a hosted model, and what tradeoffs come with each path.
Looking back, this was the beginning of my serious move into GenAI. Not because I started with a grand strategy, but because a real problem was sitting in front of me: too much context, too little time, and a browser full of material that people needed to understand. Curiosity got the first version built. Product thinking made it useful. Engineering made it possible to scale.