Automating Victory: Beating browser games with accessible Python (Part 1)
A procrastinator’s guide to project management.
Here’s how I automated beating one of my favorite games using Python, how it feels to be an unofficial tool-assisted speedrunning champion and some outlines for how you can join me at the top.
There goes my free time, and maybe yours!
Actually, unless you are my coworkers who saw my presentation during MTS 101 you probably don’t know this game, but I was hooked on it. The mobile game thankfully maxes out its in-game timer at 9999 seconds, so I don’t really know how much time I’ve spent playing. I’ve seen the timer get that high a few times while trying to beat the one-hit-means-game-over mode, usually followed by the game-over screen…
The toughest part of that mode isn’t the math or the strategies but user error. I’ve had a lot of games go for an hour or more only to be cut short by a careless mistake or, worse, guessing wrong on a 50/50 shot. That hurts. And like all good programmers, I decided to soothe my pain with cold, unfeeling technology. Let’s look at what I was up against!
Meet the monsters:
The game is a cross between Minesweeper and an RPG where the goal is to reveal all tiles on the board. But instead of mines, you have to watch out for monsters.
Unlike Minesweeper, where clicking a mine is an instant loss, you can click any unrevealed monster at or below your level to beat it. You’ll reveal the tile, gain some experience, and eventually level up so that you can click on higher-level monsters. If you click on a monster that’s too high-level, you’ll lose hit points and eventually lose the game.
Like Minesweeper, the number on an empty tile gives you a clue as to its neighbors. Unlike minesweeper, the number is the sum of the levels of the monsters around the tile. Here, the middle number is 3 because it has one level 1 neighbor and 1 level 2 neighbor.
In regular Minesweeper, there are 56 possible ways to arrange the mines around a “3” tile. But here, it could be made up of three level 1 monsters, a level 1 and a level 2 monster, or a single level 3 monster. The math works out to being any of 120 possibilities. That’s way more ambiguity to deal with!
On the other hand, because you can clear out monsters as you level up, the ambiguity on the board shrinks over time. As such, the difficulty comes down to board size and monster density. Difficulties range from Easy to Huge x Extreme (Dev’s description: やらないほうがいい / You’re better off not playing…) That mode is hard, but I had an idea on how I could get through it with much less work. All it would take is a bunch of work…
Minimum Viable Pain-in-the-neck:
My initial idea of how this script would run was to do what I do, only better. I’d run a loop of “look → think → click” until I either won or lost. This rough draft idea had a lot in its favor:
• Look at the board :
Pixel art is easy to parse and compress
• Build a virtual model :
It’s a grid of numbers. Computers love grids of numbers
• Identify safe moves :
This step is math. Computers love math
• Make safe moves :
This isn’t a shooter. Just click on stationary squares
• Repeat :
Crucially, there’s almost no information that needs to be recorded across rounds.Looking at the board always gives all the information on the state of the game
The biggest hurdle was that the developer designed the game to be exclusively played by humans. This immediately brings us to the first takeaway that helped me actually finish this: rush to the minimum viable product (MVP).
An overall MVP for the script might be something that automatically clicks random spots on the board but that represents success in every single bullet point above. That’s a lot of work for something that won’t even win the game! Skipping straight to that is painful and discouraging in the sort of way that kills personal projects.
Each step, however, also has an MVP that builds off of previous successes. For the first “look” step, I leveraged the MSS and OpenCV libraries to take screenshots and identify a reference point to find the game window. How long does that take to run? Who cares! It saves an image that I can work with.
Next up: building a virtual model. Terrible step name, way too broad of a scope. I’ll just figure out what level I am for now. Optical Character Recognition is a well-studied field with very approachable libraries and machine learning models and I’m not going to use any of that. This is just pixel art! I’ll check if the ~40 pixels match a reference image. Figuring out what else we’ll need to parse is a job for Future Jon.
Bad news: it’s immediately time to be Future Jon. We have to figure out the grid’s location to click on it, but because we’ve focused on building a working solution up to this point we can start with a script that’s already running. Even if the solution here is brittle, we can fix it down the line if needed (as Future Future Jon). From there, I leveraged the PyAutoGUI library to click the grid at any coordinates I wanted.
Finally, for repetition, we can just loop through the steps we’ve already taken and click each time randomly. Unsurprisingly, this strategy fails a lot. But it fails in ways that we can gradually refine as a working product rather than trying to pre-refine an ambiguous goal into perfect code.
A few revisions later and I was ready to start directly translating human strategies into code.
Moving Very Promptly to a Medium Viable Product
Remember earlier when I said to rush for an MVP? Well, that was useful for the grunt work but for the actual thinking parts of the project, my second takeaway was the opposite: optimize the easy stuff to understand the tricky stuff.
Follow your passions to figure out when to push through to an MVP and when to find something elegant. I wanted to make a script that could beat the game. The image parsing and automation work it took was just a means to an end.
Meanwhile, I actually really cared about implementing the strategies I used to beat the game. The simplest strategy you could actually use to play this game is something like “if a revealed tile’s number is less than or equal to your level, all of its unrevealed neighbors can be safely clicked.”
Translated to pseudocode, that might look something like:
for row in grid:
for column in row:
if grid[row, column] ＜= level:
for neighbor of (row, column):
if neighbor is unclicked
Ugh, an ugly number of indentations, and Python is terrible at nested for-loops. Writing this would work, but I know just enough about NumPy and scikit-learn to get me into trouble. Thanks to some research and experimenting, I was able to do the exact same thing without any for-loops, netting a 20x speedup!
Anyway, that doesn’t translate to anything noticeable in performance. The loop spends 80% of its time clicking, 15% of its time taking screenshots, and only 5% of its time executing these strategies. Even if I could eliminate the time the script spends calculating its next move, it can only get 5% faster. Oops. Wasted effort, should have taken my own advice and stuck to the MVP, right?
Figuring out the more sophisticated version of the trivial strategy was worth the extra time investment because the more advanced strategies build off of this simple one. The next strategy I implemented does something similar as the first step in a multi-step operation and runs recursively. Spending the effort to get a 20x speedup where it doesn’t matter taught me lessons that resulted in a 200x speedup where it does.
Ultimately, when it’s something you care more about, it’s easy to spend the effort learning and getting better. And really, isn’t that really why we’re here?
Hold on, wait, why are we here?
Glory, pure and simple
Most games list records for how quickly they’ve been beaten on speedrun.com and Mamono Sweeper is no exception. I wanted those top spots. I watched these videos of the most skilled players beating this game in minutes but only saw the lackluster failings of pitiful human bodies. I could do better. Rather, my script could do better. Here’s a link to the video of the record-holder of easy mode beating easy mode in a mere 19 seconds:
And here’s my script beating easy mode.
Caution: flashing lights and blinding speed
Records:Difficulty / Best human time (MM: SS) / Best computer time (MM: SS)
• Easy/ 0:19 / 0:01
• Normal / 1:44 / 0:06
• Huge 6:23 / 0:14
• Extreme/ 3:43 / 0:08
• Blind / 2:06 / 0:04
• Huge x Extreme / 12:54 / 0:21
• RHuge x Blind / 9:59 / 0:13
One of the biggest takeaways for me was that when shortcuts and the joy of learning aren’t enough, having an end goal will keep your project going. Even though there’s shockingly no leaderboard on speedrun.com for tool-assisted Mamono Sweeper records, getting these times as low as possible kept me opening my code editor. Beating the “blind” difficulties, where clicking a single monster means game over, required some of the cleverest code I’ve written, but I only wrote that code because I defined success as beating all seven difficulties.
All in all, I took every record down to mere seconds, took my abilities to their limits, and took a look at some of the cool things that Python can do. I’m proud of the outcome. You might even wonder whether I’ve made this entire blog post for bragging about my good script.
Of course not.
This script isn’t good, it’s the best.
he best that exists, anyway, but it’s not perfect. It’s got a couple of lingering bugs, there’s another second or two that could be squeezed out of the records, and its win/loss ratio could be improved, but I’m happy for now.
If you’ve read through this and gotten fired up to automate the hard-to-automate in your life, stay tuned for the part 2 follow-up to this blog post. I’ll provide hackable code samples, go in-depth on the nerdier details of translating my strategies to code, and talk about some of the unsettling security implications of this.
Until then, thank you for reading.
At HENNGE, we are looking for software engineers to join our Cloud Product Research and Development Team. To find out about more opportunities at HENNGE, check out our recruiting site.