AI's capacity to cheaply generate realistic images and text is poised to erode
trust in truth. Is there any hope?
In the long term, no. The Nash equilibrium of the game is that the generators
actually match the distribution. The content is, truly, as if it came from
parallel universe. The generators win and the discriminators lose.
Yet despite the bleak game theory, there is far more cause for optimism in the
near term: just because a fake looks believable does not mean it is not
detectable. To illustrate, let's briefly consider the simplest case, the
deepfake of a coin flip (“coinfake,” if you will): a pseudorandom number
generator.
Simple pseudorandom generators can produce samples which appear to be
independent coin flips at a glance. Yet we can mathematically derive many
statistical properties of sequences of random bits, and by considering many
samples of our coinfake, we can test if they hold; if not, the sequence was
probably faked. Surprisingly, most simple coinfakes -- many which remain in wide
use today -- fail some of these tests! Of course, the better the generator, the
more tests we need before it will slip up. And while there do exist coinfakes
which pass all known tests (the best being the cryptographically secure
pseudorandom number generators) they require careful design and many layers of
mixing to achieve sufficient chaos. All this to say: convincingly faking even a
coin toss is harder than you might think! To fake a whole picture truly
undetectably would be considerably harder.
Many of the same principles which apply to detecting coinfakes also apply to
detecting deepfakes. None will prove a silver bullet, but their combination will
tilt the balance back in favor of the discriminators long enough to at least
give society some time to adapt. Here are the most important.
You can't tell without enough data. Just as there's no way to detect a coinfake
from a single flip, there's no way to detect AI-generated images if the
resolution is too low or AI-generated text if the amount is too small. If
ChatGPT writes only a short sentence, it's very difficult -- perhaps even
impossible -- to tell it apart from a human. But a 200-page thesis would be
relatively easy to distinguish. Equivalently with images, it is much more
difficult to detect a 32x32 pixel fake than a 1024x1024 pixel fake. Don't trust
content which is too small, even if it is ostensibly verified.
Some of the more important kinds of statistical tests will operate over sets of
samples rather. As a result, it may be impossible to know whether a particular
image or paragraph was faked, and yet have certainty that the collection
contains many fakes.
The set of tests will grow over time. For a long time, the gold-standard of
detecting coinfakes was the “diehard tests,” released in 1995. However, the
popular Mersenne Twister (MT), developed in 1997, passes all of the diehard
tests. Since then, new tests (assembled in the augmented set TestUO1) have been
developed which can catch the MT coinfake. Correspondingly, we should expect a
continued arms race between generators and discriminators. Your trust in content
should increase over the years after its publication, because it will have to
pass discriminators that weren't known at the time of its generation.
One corollary is that there is good reason to not want to publish all of the
discriminator algorithms, because if they are known then the generators can be
more easily trained to fool them. This has the unfortunate consequence that the
most accurate immediate detections will come from opaque institutions. However,
it is possible that zk-snark based detectors will allow some additional
transparency without divulging the details of the discrimination algorithm.
Furthermore, much of what will protect the best detectors will be their
effectiveness. Discriminators which can be fooled leak information about how
they can be fooled. Perfect and opaque discriminators provide no signal to be
trained against.
Just as with coinfakes, better generators will pass ever-more tests. So, over
time, the amount of data and number of tests required to make a detection will
increase over time.
The best coinfakes (the cryptographically secure variety) are much more
computationally expensive than regular ones. Similarly, the very best deepfakes
will be much more expensive to produce and much more convincing than regular
ones. I suspect that they will be generated using a combination of high-quality
private input data which is close to the desired product (that is, as part of
the prompt), human input, and private generative models trained with
reinforcement learning using known discriminative algorithms.
For example, if one wanted to convincingly fake Joe Biden eating a sandwich, one
would input into the model both a photo of Joe with his hands near his face, and
photo of a sandwich in the appropriate pose, edit the prediction in photoshop,
and then pass it back to a final model to integrate.
The distribution of “naturalistic images” can be thought of as the union of many
sub-distributions of varying difficulties. So, your confidence in the truth of
data should depend on its contents. For example, an image of a table with
diffuse lighting on a plain background is far easier to model -- and thus easier
to fake -- than complex 3D scenes lit by bright spotlights.
To witness this in action, ask Stable Diffusion or DALLE-2 to model either “two
people hugging” or “five people hugging”. They can often model the first prompt
reasonably well, but the second prompt produces Lovecraftian monstrosities.