The Generation Game

Feb 19, 2023

AI's capacity to cheaply generate realistic images and text is poised to erode trust in truth. Is there any hope?

In the long term, no. The Nash equilibrium of the game is that the generators actually match the distribution. The content is, truly, as if it came from parallel universe. The generators win and the discriminators lose.

Yet despite the bleak game theory, there is far more cause for optimism in the near term: just because a fake looks believable does not mean it is not detectable. To illustrate, let's briefly consider the simplest case, the deepfake of a coin flip (“coinfake,” if you will): a pseudorandom number generator.

Simple pseudorandom generators can produce samples which appear to be independent coin flips at a glance. Yet we can mathematically derive many statistical properties of sequences of random bits, and by considering many samples of our coinfake, we can test if they hold; if not, the sequence was probably faked. Surprisingly, most simple coinfakes -- many which remain in wide use today -- fail some of these tests! Of course, the better the generator, the more tests we need before it will slip up. And while there do exist coinfakes which pass all known tests (the best being the cryptographically secure pseudorandom number generators) they require careful design and many layers of mixing to achieve sufficient chaos. All this to say: convincingly faking even a coin toss is harder than you might think! To fake a whole picture truly undetectably would be considerably harder.

Many of the same principles which apply to detecting coinfakes also apply to detecting deepfakes. None will prove a silver bullet, but their combination will tilt the balance back in favor of the discriminators long enough to at least give society some time to adapt. Here are the most important.

  1. You can't tell without enough data. Just as there's no way to detect a coinfake from a single flip, there's no way to detect AI-generated images if the resolution is too low or AI-generated text if the amount is too small. If ChatGPT writes only a short sentence, it's very difficult -- perhaps even impossible -- to tell it apart from a human. But a 200-page thesis would be relatively easy to distinguish. Equivalently with images, it is much more difficult to detect a 32x32 pixel fake than a 1024x1024 pixel fake. Don't trust content which is too small, even if it is ostensibly verified.
  2. Some of the more important kinds of statistical tests will operate over sets of samples rather. As a result, it may be impossible to know whether a particular image or paragraph was faked, and yet have certainty that the collection contains many fakes.
  3. The set of tests will grow over time. For a long time, the gold-standard of detecting coinfakes was the “diehard tests,” released in 1995. However, the popular Mersenne Twister (MT), developed in 1997, passes all of the diehard tests. Since then, new tests (assembled in the augmented set TestUO1) have been developed which can catch the MT coinfake. Correspondingly, we should expect a continued arms race between generators and discriminators. Your trust in content should increase over the years after its publication, because it will have to pass discriminators that weren't known at the time of its generation.
    1. One corollary is that there is good reason to not want to publish all of the discriminator algorithms, because if they are known then the generators can be more easily trained to fool them. This has the unfortunate consequence that the most accurate immediate detections will come from opaque institutions. However, it is possible that zk-snark based detectors will allow some additional transparency without divulging the details of the discrimination algorithm.
    2. Furthermore, much of what will protect the best detectors will be their effectiveness. Discriminators which can be fooled leak information about how they can be fooled. Perfect and opaque discriminators provide no signal to be trained against.
  4. Just as with coinfakes, better generators will pass ever-more tests. So, over time, the amount of data and number of tests required to make a detection will increase over time.
  5. The best coinfakes (the cryptographically secure variety) are much more computationally expensive than regular ones. Similarly, the very best deepfakes will be much more expensive to produce and much more convincing than regular ones. I suspect that they will be generated using a combination of high-quality private input data which is close to the desired product (that is, as part of the prompt), human input, and private generative models trained with reinforcement learning using known discriminative algorithms.
    1. For example, if one wanted to convincingly fake Joe Biden eating a sandwich, one would input into the model both a photo of Joe with his hands near his face, and photo of a sandwich in the appropriate pose, edit the prediction in photoshop, and then pass it back to a final model to integrate.
  6. The distribution of “naturalistic images” can be thought of as the union of many sub-distributions of varying difficulties. So, your confidence in the truth of data should depend on its contents. For example, an image of a table with diffuse lighting on a plain background is far easier to model -- and thus easier to fake -- than complex 3D scenes lit by bright spotlights.
    1. To witness this in action, ask Stable Diffusion or DALLE-2 to model either “two people hugging” or “five people hugging”. They can often model the first prompt reasonably well, but the second prompt produces Lovecraftian monstrosities.

DALL-E's Lovecraftian Hug
Example generations by DALLE-2 of the prompt, “two people hugging”, and “five people hugging”. One is easier to detect than the other.