Next Advertising Era - Google Privacy Sandbox: Google Topic API

What is Chrome Ad Topic, how does it works, and where is interest-based advertising heading in the future? The first episode of Next Advertising Era

Oct 26, 2023

Returning from summer vacation, we found the new prompt to activate the APIs of the Google Privacy Sandbox in Europe and in all those countries where explicit consent is required to profile online activities.

Google has released several features within the Privacy Sandbox, but the three most important from an advertising perspective are:

Topic API: in Chrome called Ad Topic
Protected Audience API: developed in Google Browser as Site-suggested ads
Attribution API: In Google Chrome implemented as Ad measurement

Why did I want to emphasize how they are named in Chrome? Because APIs are open standards that can be implemented by any browser vendor or operating system developer and can be named differently: as demonstrated by the choices made by the Chrome team, who knows, maybe soon other browsers will show interest.

If you prefer to read the Italian version you can find it just clicking.

What is Topic API?

Third-party cookies make it possible to analyze a user's behavior across all the websites they visit and define a profile of interests. It's easy to understand that this function is not designed to protect the user's privacy, but all digital marketing experts know the potential of customer acquisition strategies based on the interests of their target audience. The Topic API tries to recreate this functionality without using third-party cookies and without infringing on the user's privacy.

How does Google Chrome Ad Topic work

With the user's permission, Ad Topic analyzes the browser's history over a certain period of time called an Epoch. Currently, an Epoch lasts 7 days, but the start day is randomly determined by the browser. The Topic API exclusively classifies the domains that have requested access to the user's Topics, not taking into account the others. The browser proceeds with the classification based on a predefined taxonomy of 470 categories. One important thing to note: not all sites that a user browses are classified, but only those sites that potentially want to monetise traffic. The classification of a domain and the calculation of related interests is not indiscriminately active across the entire user's browsing history.

The browser checks out the categories you've been browsing in a certain time frame, called an epoch, and figures out your top 5 interests based on what you've checked out the most.
When a website wants to know what topics you're into, it needs to ask your browser by using this command:

document.browsingTopics()

Each time the browser gets this request, it picks up to 3 of your interests randomly from the last 3 Epochs. But here's a twist: in 5% of cases, one of those three topics could be totally random. These steps are there to stop someone from building a detailed user profile by simply querying the browser multiple times, which would be a total privacy invasion. Moreover, a single adtech vendor don't see interests that have not been explicitly observed previously. A peek at the table will clear things up a bit.

Here's how it works: adtech vendors 1 and 2 on the yoga site see that the user is into fitness. On the knitting site, adtech1 figures out the user's interest in crafts, while on the hiking-holiday site, adtech2 picks up on interests in Fitness, Travel & Transportation. However, there's no adtech action on the diy-clothing site. So, if adtech1 pulls data from a site, it might get insights on Fitness and Craft, but it's left in the dark about Fitness, Travel & Transportation. On the flip side, adtech2 can learn about Fitness and maybe Fitness, Travel & Transportation, but it misses out on Craft. The big thing is, neither catch on to your Crafts, Fashion & Style interests because they've never seen it directly. It's an extra layer of privacy protection

If you're using Google Chrome, you can see the interests it's figured out from your browsing over different periods by heading to chrome://topics-internals/. Here's a screenshot from my own browser.

How does a domain get classified?

The browser takes care of classifying the domain: right now, it's just sorting things based on the hostname, not the specific pages, all to keep user privacy in check. Things might shift down the line, but for now, Google's playing it super safe

Your browser's using a model built with TensorFlow Lite, sitting and working right there on your device. Google trained this system using 10,000 domains manually classified. A domain can fit into more than one category. You can check out the list of these domains yourself, it is available on your device in the file override_list.pb.gz.
If you want to see the classification happening right in your browser, just try the classifier on any domain by heading to chrome://topics-internals/.

Using Chrome's interface, you can:

Check out the Topics you've browsed
Block certain categories so they won't be used in sorting your interests.

Honestly, I find the whole process pretty straightforward and easy to handle.

Wrapping Up

Chrome's Topic API is definitely pushing the boundaries in user privacy. It's early days, so it's hard to say if it'll totally take off, but Google and the ad tech scene are going to spend the next few months figuring out if it's doing what it should, tweaking things along the way.

A significant number of ad tech companies are exploring the capabilities of privacy sandbox technologies, with particular attention to the Topic API. I'm not in a position to say for sure how bulletproof it is in terms of keeping user privacy safe, but from what I've dug up, it's light years ahead of old-school 3rd-party cookies — it doesn’t let anyone easily piece together any one person's online footprint. Plus, there's this feature where an ad tech firm can only see interests you've already shown in your browser, putting a cap on the chances of them building a richer profile of you from your email or IP address

For advertisers, the current landscape presents a shift: at this juncture, the Topic API may lack the precision of existing technologies designed to discern user interests. However, it holds the potential for significant evolution within the forthcoming 12 months. Should it maintain its current trajectory, advertisers would be compelled to adapt their strategies. This necessitates launching campaigns that encompass a wider range of user interests and, following this, strategically employing first-party data to guide consumers along the various phases of the acquisition journey.

For publishers, the landscape is evolving as follows: Presently, publishers employing Google AdManager as their ad server will discover the Topic API testing phase is automatically activated, a setting which can, at their discretion, be deactivated. Furthermore, it can be incorporated as a module within the prebid framework.

In the ensuing months, there is an anticipation of growth in traffic monetized via the Topic API. It is prudent for publishers to diligently oversee the manner in which their site is interpreted and categorized by the classification model.

It is pertinent to note that there currently exists no mechanism to petition for alterations or updates to classifications rendered by the model, though this is a subject of ongoing deliberation. The future holds the promise of provisions being instituted, permitting publishers the facility to report inconsistencies encountered.

The fundamental goal of the Topic API aligns with what I've always envisioned for the Next Advertising Era: changing everything in order to change nothing. The infrastructure of advertising will undergo transformation, yet it will remain opaque to advertisers who will continue to purchase advertising using methodologies that are similar, if not identical, to those of today.

Thank you for reading dataMesh:This post is public so feel free to share it.

Reading Suggestions

Be Data Literate by Jordan Morrow that is driving my professional choice

Present Beyond Measure: Design, Visualize, and Deliver Data Stories That Inspire Action by Lea Pica that is supporting my data storytelling

IAB Tech Lab Identity Solution guide from IAB to understand Alternative ID

Marketing in the messy middle Google's document to guide us through the Messy Middle, Mountain View's perspective on summarising the user's decision-making process.

Start writing today. Use the button below to create your Substack and connect your publication with dataMesh

Start a Substack

dataMesh: data with Filippo

Discussion about this post