Talking TV: How News Content Authentication Is Battling AI
The Coalition for Content, Provenance and Authenticity (C2PA) — a group comprised of technology and media companies — was formed to help combat disinformation by authenticating news content at its source. It was a tough job at the outset, but the emergence of generative AI is making it much harder as bad actors are equipped with ever-better tools.
Pia Blumenthal works with C2PA as co-chair of its UX Task Force, which she does alongside her day job also fighting disinformation as design manager for the Content Authenticity Initiative at Adobe. In this Talking TV conversation, she explains the work she’s doing in each capacity.
It’s work with which every newsroom needs to become acquainted as opportunities for their own news products to be manipulated proliferate. Content authentication will likely become an essential tool to help retain trust, which is already heavily eroding in an age rife with disinformation and misinformation.
Episode transcript below, edited for clarity.
Michael Depp: The Coalition for Content, Provenance and Authenticity, or C2PA, was formed to tackle the prevalence of misleading information online by developing technical standards for certifying the source and history, or provenance, of media content. Essentially, C2PA is building tools to ensure that content is actually coming from where it purports to come from.
This coalition, which is comprised of Adobe, Microsoft, Intel, BBC, Sony and others, has its work cut out for it given the proliferation of misinformation and disinformation and the ever-growing sophistication of the tools used to propagate it.
I’m Michael Depp, editor of TVNewsCheck, and this is Talking TV. Today, a conversation with Pia Blumenthal, design manager for CAI at Adobe, where she leads design for Adobe’s Content Authenticity Initiative. She’s also co-chair of the C2PA UX Task Force. We’ll be catching up to the very latest on where this provenance authentication is progressing and how it is adapting to developments in AI. It’s an essential conversation for every newsroom concerned with the authenticity of the content it receives and disseminates, which is to say every newsroom. We’ll be right back.
Welcome, Pia Blumenthal.
Pia Blumenthal: Hi Michael. Thank you so much for having me today.
Thanks for being here. Pia, first, for the uninitiated, can you frame up the nature of the work that you do at C2PA? It’s not a droid. It’s an awkward acronym. It sounds a little bit like a Star Wars droid, but if you can frame up the work you do there and at Adobe and where this intersects with news content.
Of course. Well, actually, let me invert that order. So, the Content Authenticity Initiative is an Adobe-led initiative. We’re a community of, at this point, about 1,500 members, including media and tech companies, NGOs, academics, others working to promote the adoption of an open industry standard for content, authenticity and data transparency. So, the C2PA, on the other hand, the Coalition for Content Provenance Inauthenticity, is a collaboration between CAI and another previously existing entity, Project Origin, led by Microsoft and BBC. And so, these two projects merged to form what is the technical standards body driving best practices and the design of how we implement provenance across all media and content types or really any type of implementation from a publisher to social platforms handling a variety of concerns, especially today, AI being one of those concerns and how we might make content more transparent.
To your knowledge, are newsrooms sufficiently aware of what C2PA is and what this work is all about?
We do have a number of both wire services and news media publishers who are investing in CAI. We hope that they soon begin their own implementations of the C2PA standard and to assist with that. The CAI has developed a suite of open-source tools built on the C2PA aspect that really, again, anyone but especially publishers of these media can begin to integrate into their systems, to help their consumers — and really beyond just their platforms — understand where the content is coming from, who’s responsible, what may have happened to it along the way.
And as I understand it, there’s been a little bit of a road show going on the last year or so to kind of proselytize this, get the word out in media circles.
Yes, that’s certainly true. Our mission began to address myths and disinformation concerns, which, of course, are being accelerated with all of the new generative AI technology that we’re seeing today. But even several years ago, which is roughly 2019, when I was first introduced by Adobe, we saw what happened with the Nancy Pelosi cheapfake. It was a simple edit to slow the speed of a clip of Nancy speaking to make her look like she was slurring her words. That’s something that we call a cheapfake. And so, of course, those concerns are accelerated now. And it’s very hard to actually detect whether or not it’s something is actually a source of truth.
So, do you call it a cheapfake because it was sort of simply done and it wasn’t very sophisticated and easy to spot?
Correct. Exactly. You know, we don’t need a ton of sophisticated technology to still intentionally mislead people.
Right. OK. So, tell me about the progress that you are making, generally speaking, in terms of being able to authenticate more types of content, of media content, now.
Of course. So, I would say largely implementations have started with images, photos or images created in software like Adobe Photoshop. We are working towards, at least on this C2PA UX best practices side, implementations around video, provenance, audio. Soon documents like PDFs. We try to outline again how people need to interact with different media types in a variety of scenarios
So, the best practices have to really be super flexible to handle any type of content, content theme, a place where it could be surfaced, and more importantly, the types of information that could be unique to that content. So, anything from identity associated with the creators or editors to the types of edits that might have happened, the ingredients that were used to create those pieces of content, and then we attach that to the content itself. So, it follows it wherever else it may go, and then over time builds this rich trail of provenance information that someone can look back to and hopefully find the origin points.
Where does it follow it exactly? Is it sort of a metatag string, or how does it manifest inside of this piece of content?
What we do is we take all of this metadata, some of which already exists, and as much as possible, the C2PA relies on existing metadata frameworks like schema or exist for cameras or IPC, of course for photography, and we package that into the content itself. I think the biggest differentiator for us between other types of metadata is that we apply a level of verification through a digital signature.
And so that really means that there’s a responsible entity, the signer who says that this is the state of this data at the time that it was exported or saved or created, and then that data either lives within the content itself or is referenced on an external remote cloud so that if the data is ever stripped, there is actually a record that can be repaired through something that we call soft binding or digital content fingerprinting.
So, we basically look at that content and say this is actually what it matches on the cloud itself. And therefore, if that data is stripped off, we can refer back to it through the cloud.
How does the content originator make that digital signature? Is that something that’s embedded in the Adobe program, for instance, on which it’s being edited?
This could go a little bit beyond my expertise as a designer. Our signature model, our trust model, is based on the existing one that you might see across the internet. How do you know a website that you go to is trustworthy? You look for that little browser lock, right? There’s an SSL trust certificate that a series of different entities disseminate and also look forward to respect in the absence of that trusted certificate. It signals to you as the viewer that, you know, you might not want to look at this or you proceed with caution.
And so that’s essentially how our trust model works. Adobe is in and of itself a trusted entity that’s issuing signatures for applications like Adobe Photoshop or Lightroom, where we have a beta experience being developed and any of the other upcoming soon-to-release features. Adobe in this case is the signer. Photoshop would be the machine validator of any sort of edits that someone might take on a piece of content and then kind of going down at the trust signal list. We also have anything that a person can manually enter about their content. That’s where identity comes into play.
But in order to support your identity claim, we, at least within the Adobe ecosystem, have created a series of connected accounts. Social media or Web3 accounts that someone can head off into and then include in their content credentials to help give them that social proof. Kind of in the absence of having a verified identity service, which is something that we are collectively working towards.
So, this will serve as a good proxy until you can get that retinal scan?
Hopefully it’s not to that level as certainly that would be off-putting for many. But there are countries in the E.U. that already support verified identity. We’re looking at those as models, even states within the United States that are moving towards a more digitally secure identity service.
AI and its ability to generate images and videos is complicating this whole process, it would seem. Can you describe how?
Well, AI has reached a level of adoption and sophistication where it’s in the hands of many. And there isn’t much regulation around the world, although there’s certainly an increased effort in the EU and one trending in the United States. And so, at scale, there’s a huge concern that as the technology just continues to get better and better, it’s harder and harder to detect. That’s the biggest concern right now. And so, we offer a proactive way for people to claim attribution and transparency about how something was made. And we think that this is going to be a really powerful way for consumers of content all around the world to be able to look for that provenance data and then make more informed trust decisions about that content.
Maybe this is a little too sci-fi a question, but is it getting closer to the place where it could outsmart you on the user authentication front, that it could generate these triangular kind of identities that you verify and make you think it’s an actual person?
I think we are moving in that direction as these tools get better, the detection mechanisms need to also keep up, and it’s going to outpace that effort, I mean, fewer and fewer detection processes. You may be able to catch this type of content at scale.
Are you building tools that can delineate content that has been built by AI specifically?
Well, in the case of Adobe, where we have our own generative AI platform called Firefly, we have built content credentials directly into the core experience. So, Adobe is tackling this in a number of different ways from sourcing the content for training ethically using Adobe Stock material and open licensable imagery to, of course, including something that we’re calling an AI disclosure and that’s within the content credential itself. Every Adobe Firefly image comes with a content credential that says this was made with an AI tool.
Is this more difficult when some of the content has been created with AI but not all of it?
Yes. There is now in Adobe Photoshop a beta feature called Generative Fill that essentially takes an existing image and then allows users to fill in areas of that image with new generated content. It’s also called inpainting. There are other tools that allow you to do this. And again, as part of the larger initiative, they are also thinking about this type of CGP disclosure that says some or all of this content was made with an AI tool on the C2 side. How we tackle that is again, looking at an existing framework created by IPC called Digital Source type. You can say this is a synthetic composite. We can have a little bit more nuance in terms of the type of labeling that you might expect to see based on how these tools are being utilized.
And that warning or that caveat is visible to the user. I mean, you’ve got to make sure, of course, that that gets to the consumer when you’re talking about a news context here, because if a consumer can’t see that, then the caveat is meaningless.
Absolutely. So, I think the way to think about content credentials and really the implementation of C2PA data more broadly is that there are multiple parties. There’s the creator side that chooses the types of information they want to include in the content credential, which then appears on the consumer side. The consumer side is really the more challenging aspect to design for because we need to make sure that for the uninitiated, this information is understandable.
There’s also an incredible behavior change, which is how do we let people know that this type of data is available? How do we inform them of the trust model? Through the CGA UX Task Force, we created a series of progressive disclosure experiences, starting with just an icon that indicates the presence of a content credential data followed by this lightweight summary, which is where you would expect to see that type of disclosure. And then for those who need to dig in more and see the entire provenance chain, they should be able to do that. And then, of course, for the forensic experts who need to see the raw code itself and really the rest of the rich information that just might not be consumer friendly or understandable, they should also be able to do that.
It seems like there’s a lot of work that needs to be done here, not just in terms of individual newsrooms catching on to this system, but consumer literacy here. And media literacy is already a pretty challenged area, almost everywhere. So, this can’t be too complex of a system for the average consumer to understand.
Absolutely. We like to talk about content credentials from this perspective as being part of a three-legged stool. You have detection, of course, but you have to help bolster that with a proactive measure. That’s where we have content credentials. And then the last leg is the need for better and increasing digital media literacy that now helps people understand what AI is, how it works, where they might experience that.
And on that front, the CAI has actually created a suite of educational materials for middle school, high school and higher education. We are actively working with academics to create that content and to disseminate it into classrooms around the world.
But that dissemination is tough because there’s not a central United States curriculum. And so, you’ve got to do that at every level of a school board almost, you know, and sometimes states or in Canada provinces why they have they have some media literacy programs in place, but not really at scale almost anywhere. So, that’s going to be a hell of a slog.
I mean, I would say nothing about what we’re working on is easy. But the best part is that there are multiple extremely intelligent individuals from many different companies covering a wide variety of verticals, all thinking about these problems. It truly has to be an industry-wide effort, but it also has to require government support from different countries that can trickle down to, you know, to classrooms, academics, researchers. It’s not one company can solve this problem. It really takes everyone to invest.
Do you foresee media companies, actual newsrooms, getting involved in direct consumer education on this front as well? Do you think that they’ll have to absorb part of the burden and go to their viewers or their readers and explain this periodically?
I can’t necessarily speak to their direct relationship to academic settings, but I can say that again, through the C2PA UX Task Force, one area of recommendations we’re actively working on is how to help implementers talk to their different audiences about what we’re doing. So again, that is a core concern for us, is we need to make this experience simple and understandable. A lot of research is involved in continuing to optimize for those things and so ultimately, we’ll have the set of best practices that we hope implementers can utilize for faster results based on our new understanding and design.
OK, I’m going to stop drawing you away from your end of the pool quite so far. I want to ask you about Adobe’s Do Not Train tag, which you’ve added for content creators to use if they don’t want AI to train on that piece of content. Can you explain why that would be something that they would want to employ and how that works?
Yes, of course. This is something that was introduced in a C2PA spec, and there’s kind of a number of different subtle, subtle differences in the ways that you may not want to train your content. But ultimately, we know that from Adobe’s perspective, our audiences are our creators who work really hard to develop a style and unique perspective on their art. We want to help them protect that content from web crawlers, just looking to build training models to train AI algorithms on. And so, the idea behind the Do Not Train is that this would be part of a content credential setting that web crawlers would respect and then exclude those images from their training models.
Is it suggestive or is it an absolute “you’re verboten to train on it” by the way it’s set up.
I would say, for implementers, it would be a hard preference to respect. But of course, this requires adoption at scale for the future, which, based on our volume of members between C2PA and I, we do anticipate would be the majority of places that you go to consume content.
Do you have the sense overall that you are able to keep up with the proliferating ways in which content can be convincingly fabricated?
I hope so. Yeah, certainly we work very closely within Adobe with the teams directly responsible for all of the new Firefly features. Content credentials has been a core part of developing those features and making sure again we’re doing it ethically with complete transparency.
All right. Well, you are fighting the good fight, Pia Blumenthal, so keep it up. Thanks for being here today.
Thank you so much for having me.
Thanks to all of you for watching and listening. You can watch past episodes of Talking TV on TVNewsCheck.com and on our YouTube channel. We also have an audio version of this podcast available in most of the places where you consume your podcasts. We’re back most Fridays with a new episode. Thanks for watching this one and see you next time.
Comments (1)
BeyondTheBeltway says:
August 18, 2023 at 11:29 am
Thank you for this video. This is the scariest thing I have ever seen. If anyone thinks that this tech is only going to be used as they claim, you need to read a history book. Congress needs to place limits on this gross invasion of privacy, now!