Possibility Space

< Back to Index Posted: Feb 19th 2025

Muse: Before You Tweet

Hello there! Today Microsoft Research announced the release of a new paper about generating gameplay footage. A lot of people had a lot of takes about this, but there were one or two common misconceptions I wanted to point out. I hope you find this interesting!

The System Is Not "Generating Gameplay" or "Ideas"

The title of the paper uses the phrase 'ideating gameplay', which is technically a description of what is happening in the paper, but is pretty easy to read as "this is an AI that generates games", which it isn't. I'm going to include a short summary of what they did here, just to give you the gist.

They trained a model on years and years of video footage of people playing an action game. This model can generate new video that looks sort of like the game being played. I could give it a screenshot of the game and say "show me what you think happens next" and it will generate a video clip.
They made a tool that let game developers edit a game level using existing game concepts like adding in a jump pad to a place where there wasn't one before.
They then gave this new level to their model, and asked it to show what it thought the footage of a player playing from this new position would look like.

So the idea here is that the system has internalised how people play this game, and also how they interact with things it already knows about (like jump pads). The premise is that a designer could then make an adjustment to a level and ask "how would this scenario play out" and see some video showing what the model thinks would happen. We'll get to whether this actually is a good idea a bit later, but this is not (for now) an attempt to generate new games, or even new game ideas, or even new parts of game ideas, really. The ideation in the title is happening inside the human using the tool.

The Paper Is Not Really About "Generating Gameplay" or "Ideas"

In my opinion, the paper isn't even about this. The paper's actually a study of how human designers think about working with generative AI tools. The authors talk about three properties they believe the tools need: persistency (i.e. if I add a jump pad and ask what happens next, the AI tool can't delete the jump pad), consistency (i.e. if the model shows players reacting to something in a certain way, they should react in a similar way the next time the same scenario arises), and diversity (the generative model's idea of how players behave should cover a broad range of behaviours that it saw in the input data). I'd argue that these properties are not necessarily sufficient, and some of them are only needed because of generative AI's inherent weaknesses in the first place. But really, I think the paper is about this stuff - it's about these researchers thinking about the implications of how people will work with these tools.

I only point this out because despite being critical of this technology, I think it's a good thing to see researchers sitting down and conducting user studies with game developers and actually engaging people with the process. One major criticism I will point out here is that they selected for game developers who were already using or intended to use AI - I think proponents of this technology need to also sit down with people who hate it, or who have used it and didn't like it, to truly understand the range of experiences and difficulties. I don't expect it to trigger an enormous change in philosophy or aims within Microsoft Research, but it is nevertheless better than big tech companies issuing commandments about what people are going to use AI for.

The Screenshots Are Fuzzy Because The System Is New

Just a small note here because I saw some tweets talking about how 'bad' the outputs looked. When we build AI tools to work with images and video, the resolution of the data exponentially changes the computing power needed. Machine learning systems often downscale images to work on them at smaller scales, before upscaling their outputs again at the other end. However, sometimes we don't even want to start or finish with high-resolution images, particularly in research where the work is new. The output resolution for this system is low because, well, why not make it low. The paper is testing out ideas about getting people to work with AI tools, and it doesn't need to try and make it higher resolution. We could also argue that since the tool is designed to show people how a scenario might play out, it doesn't actually need to be high resolution (ignoring Phil Spencer's idiotic comments about how this means we can emulate any game now from some video footage).

The System Is Not A Preservation Tool

Phil Spencer said this technology might mean you could emulate any game on any platform by training a model on some footage of it, and that this was good for game preservation. I mean, in a sense anything is a preservation tool. I could ask my friend's five year-old son to draw a crayon picture of what he thinks the ending cutscene of Final Fantasy VIII looks like and that would still count as game preservation of a certain sort. So in a sense, yeah, train an AI model on game footage and you can stick that in an archive somewhere. The reason I say this isn't a game preservation tool is because Microsoft's claim that generative AI can be used to help preserve games has a lot of weighty implications and wink-nudging behind it, a lot of eyebrow waggling as it implies this model somehow preserves game's ground truth all by itself, which I think is flawed at best.

Ten years into the boom, we still do not have a good way to express or measure what a model captures and what it does not. We do not have a good way to assess whether its behaviour captures the behaviour of its real-world counterparts. The model in this paper was trained on years and years of video footage of a single, fairly simple game. Even this model will not capture everything the game contains inside it, or everything players can do - and I've already talked about how things like the Minecraft Oasis demo obviously and clearly fails to capture anything not happening on screen. This is absolutely not a solution for game preservation. There are things it could be useful for - if the understanding of its imperfections are there. If you've ever seen those weird news articles about how archaeologists have reconstructed what Julius Caesar looked like, and it turns out he looked like a sort of Wallace and Grommit figurine that has melted slightly, I could imagine models to have a use like this. People know this is not really what Julius Caesar looked like. They can provide plausible reconstructions of something, which we can take with a huge pinch of salt and a load of caveats, and enjoy as a sideshow to actual preservation efforts maybe.

But preservation is not just about gameplay, and it's not just about video. Florence Smith Nicholls wrote a report on this very topic while working at the British Library in 2023, where they ran a project to archive several digital games. What does it mean to preserve a gameplay experience? Even if this model was a perfect replication of the original executable software, this is not the be all and end all of game preservation. A generative model of what game footage maybe looked like once might be a nice curio on the side of a real preservation process, but it is always going to be inferior to other ways we approach the problem.

This Is Not A Practical Process, I Don't Think

This system is simple: it can just about understand how this relatively straightforward game works, after looking at seven years of example data. It's impressive that it can do this using visual information because things like lighting, camera angles, user interface and so on are a lot for an AI model to handle. But ultimately, even with all of this data, all the time spent annotating datasets, and so on, it was still only just about able to generate footage predicting player behaviour. If we think about this in the context of modern game development, it's clear that this isn't a very practical idea right now - most developers couldn't afford to do this, and the ones that could do not have years and years of player data to show an AI, because by definition if they did their game would already be out.

As with all modern AI research, I think it's not necessarily an important criticism to make, because a lot of research is speculative. Most of my research is also not ready to be implemented by game developers, but that's not why I do it - I do it to explore ideas. The research team behind this probably believe it will get more efficient over time, which might make it more affordable or tractable for small developers. However, it still raises the question of how we get video footage of people playing our game in the first place. If you've been in development for a couple of months then you won't have enough footage, and even if we make the systems able to run on less input data there must be a minimum level required to understand the full game logic. So I think there is a question here not just of whether it makes sense as a tool now, but whether it can ever make sense.

There are a million other questions beyond this of course. I mentioned earlier the ideas of persistency, consistency and diversity that are mentioned in the paper. Diversity is a particularly tricky one, I think, for two reasons. First of all, although the researchers measure how well their generated video matches the real player behaviour, this assumes that your footage accurately represents your real playerbase. Suppose all your footage came from the beta branch of your game, then you might only get a certain kind of player being recorded. Diversity in the model only guarantees agreement with the data - not reality. The second, much bigger issue is that design ideation is explicitly about thinking up scenarios that change player behaviour. That's what game design is. If I add something that is genuinely new, the model is unlikely to be better or worse at speculating how players might respond than I would be just imagining it myself.

But we're getting way ahead of ourselves here and way into the weeds of this approach, into criticisms that are probably only interesting to about a dozen people in the entire world.

I Think AI Designing Games Is Cool

This paper is not about AI generating games, but a lot of people thought it was and tweeted about how much they hated this idea. I say this as someone who has spent fifteen years thinking about how AI can design games: I think it's cool, when done correctly, and the correct way to do it is to make something stupid, slow, cheap, weird and broken. I think systems like this can inspire people to make interesting stuff, I think they can make alien games that humans would not, I think they can be fun art pieces and interesting engineering challenges. I think when we feel we have ownership over technology, that we can build things ourselves on small scales where we feel in control of what it does, and in charge of its scope and potential harms, then we feel a lot happier opening ourselves up to ideas like "what if I used algorithms to generate game levels for me".

This month I'm finishing up writing a book about procedural generation and games - not as a technical guide, but as a pop science book exploring what is artistic, beautiful and inspiring about it. There is just one chapter on machine learning, because there's a few important things I want to say in there, but most of the book is about the rich and exciting history of getting computers involved in our creative processes, which has been going on for decades. I understand people are jaded, hurt and afraid by years and years of complete bullshit from the worst people imaginable. But I hope we can also reclaim and protect the idea of weird computer things. I think being weird with technology is one of the strengths that independent creators have that can't be encroached upon by corporations or tech companies, because weirdness is unreliable, risky and a bad investment.

If you want to watch me talk a little bit about some of my ideas in this space, I gave a talk about a (relatively) recent project of mine a couple of years back. You can watch it here. I'm still working in and thinking about this space, but sadly I have a lot less time than I used to. I might have a little something to show later this year though.

I hope this maybe helped clear up some confusion about a couple of elements of the paper, and gave you a couple of different things to think about. Thanks for reading!

Posted February 19th, 2025

Back to the blog index

Dr. Michael Cook

AI Researcher &
Game Designer

Muse: Before You Tweet

The System Is Not "Generating Gameplay" or "Ideas"

The Paper Is Not Really About "Generating Gameplay" or "Ideas"

The Screenshots Are Fuzzy Because The System Is New

The System Is Not A Preservation Tool

This Is Not A Practical Process, I Don't Think

I Think AI Designing Games Is Cool

Dr. Michael Cook

AI Researcher & Game Designer

Muse: Before You Tweet

The System Is Not "Generating Gameplay" or "Ideas"

The Paper Is Not Really About "Generating Gameplay" or "Ideas"

The Screenshots Are Fuzzy Because The System Is New

The System Is Not A Preservation Tool

This Is Not A Practical Process, I Don't Think

I Think AI Designing Games Is Cool

AI Researcher &
Game Designer