| FPGA implementation
The Game Of Life can generate some pretty interesting patterns.
However it might take many generations (grid updates) for evolving some of the patterns.
So a question one can ask is how fast can a FPGA do the necessary calculations. It should be fairly efficient as the cells are very simple state-machines (only 2 states).
I was curious about this so I made an implementation of game of life for the Turbo Chameleon 64 platform.
Probably the most efficient way to do the simulation would be to map it onto the lookup tables of the FPGA.
Both the life simulation and the FPGA LUTs represent a regular grids of cells.
Each cell in life has eight neighbors and must depends on its own state as well, so the state-machine that needs to be implemented has 9 inputs and 1 output.
With a little luck it can be squeezed it into couple of 4-input-LUTs and a D-FF.
The problem with this direct approach is the additional logic that is necessary for pattern editing and video output is not taken into account.
It is not immediately clear how logic required for editing and display would be connected when the simulation is distributed around the FPGA.
Even if that is somehow solved, the resulting grid will be rather small (maybe a 100 by 100 grid at maximum on a medium size FPGA).
A second approach is using external memory (like SDRAM).
Here the simulation speed is mainly constraint by the speed of the used memory.
Some of the memory bandwidth will also be taken by the video output.
The simulated world however could now be a lot bigger.
However reading all of the memory on each video frame doesn't seem likely.
So only a small part of the simulation can be made visible at any time, somewhat defeating the purpose of the large memory.
So the method I've choosen for my implementation is using the interal block-RAMs of the FPGA to store the grid state.
The block-RAMs offer large bandwidth to the data as all the block-RAMs can be accessed at the same time.
This results in a high iteration speed.
Also the block-RAMs allow easy inplementation of the editor and display functions as the grid content can be accessed as a linear array.
The Algorithm
On each clock tick the state of a complete row of cells is read from the block-RAMs, a total of 512 bits in a single clock.
The value of 512 bits is somewhat arbitrary.
It is however a nice power of two (2^9) and the resulting grid nicely fits on a standard VGA resolution (640x480) with some room left over for a menu on the right side.
The absolute maximum is 1188 bits when using all 66 block-RAMs that are available in the Cyclone III FPGA as used on the Chameleon target.
The number of rows is even more arbitrary set at 472 so there is some space for display of a border on the screen around the grid.
The steps to calculate the next grid is in principle quite simple.
Two rows are stored already in flipflops inside the FPGA.
A third row is read and the middle one of the three rows is recalculated and written to the block-RAMs.
Then a shift of data takes place, the second row becomes the first and the third becomes the second.
Now the next row can be read into the third place and the new state for the second row which was read the previous cycle is calculated.
This continues until the end of the screen is reached. There are however a few additional details that we need to keep track of, which I will explain next.
The algorithm described above has one flaw. It is not obvious how the first and last row are updated.
We could ignore it and waste two rows (which isn't so bad in itself). However the life simulation will behave incorrectly when a pattern hits the edge.
My solution is making the world a torus (the bottom edge connects to top edge and the left edge connects to right edge).
So the first row uses the state of the last row to update, which gives us the addional state we need to update the first row.
Ofcourse that means the oposite is true as well, the last line uses the first line to update its state.
However by the time we reach the last line, the state of the first lines in memory are already on the next iteration.
The life simulation depends on the fact that all cells update at the same time.
To solve this an extra row of flipflops is added to store the previous state of the fist top row (called game_buffer_0 in the vhdl source).
Now when the last row is processed instead of retrieving the first row from block-RAM, it is copied from the game_buffer_0 flops.
The left and right connection is ofcourse much simpler to implement, as this can be handled on a row basis. There is no need to store any extra state.
VGA output
There are many ways the VGA controller could get access to the block-RAMs.
The implementation I made has a 512 bits shift register. It uses the same memory interface as the simulation.
At the beginning of each visible scanline the VGA will interrupt the simumation and read one row from memory into a shift register.
Then it quietly shifts out 512 pixels before it needs to read a row again.
Only a few thousand accesses need to be made per second as there is only one access necessary per scanline.
This has a very low impact on the simulation speed as the frequency of these interruptions is so low.
User Interface and Editing
The editing of the grid shares the same memory interface as the VGA and simulation.
When a grid cell is selected the complete row corresponding with that cell is read from memory.
The state of the selected cell is toggled and then the complete row is written back into memory.
Normally this is done when the simulation is paused, but it is possible to edit while the simulation is running.
Although the hardware alternates between running and editing this switch is so fast (about 50 nano seconds) the user will experience it as happening at the same time.
Next to the gid is a simple UI displayed consisting of a collection of buttons.
The buttons at the top right of the screen change the simulation speed.
It can be changed in a couple of steps from paused to running at full speed.
At the highest setting the simulation runs many times faster than the VGA monitor can display.
Next are two columns of 9 buttons. These define the rules of the simulation.
The left column defined the rule for dead cells, the right column for cells that are alive.
The way these buttons works is that they represent the state change for a certain number of neighbors.
When the button is filled closed a cell with that many neighbors becomes or stays alive.
If the button is open a cell with that many neighbors will die.
After startup the buttons for implementing the standard rules of Conway's game of life are selected. |
 |
Conclusion
I've implemented Conways game of live in an FPGA. Using the high bandwidth of the internal block-RAMs allows very fast simulation speeds.
The simulation using a modest 100 Mhz clock can process about 25 million rows per second.
Which translates to 12.8 giga cells updated per second or over 50 thousand grid iterations per second.
It allows the simulation rules to be changed and the grid edited on the fly.
The output is on a standard VGA screen and it is controlled with a PS/2 mouse.
In the current configuration only about half of the block-RAM capacity is currently allocated for the particular FPGA used.
One possible extention is going to 2 bits per cells, so the rules can be made more complex.
The VGA controller would need to be updated to show gray-scale (or different colors) for the 4 states a cell can take.
Fetching 1024 bits per row is still below the maximum of 1188 a single ported design can provide.
So this change would not reduce the simulation speed.
|
| The Rules
Although it is called "Game of Life" it is not really a game in the usual meaning.
It is a cellular automaton that emulates articificial life on a grid world.
So the "game" is feeding it patterns and see how it evolves over time.
There are a few simple rules that define the simulation.
- Each grid cell has one bit of state, representing the cell being dead or alive. All cells in the world are updated at the same time.
- If a cell has none or one neighbor cell that is alive it dies (due to loneliness).
- If a cell has four or more neighbor cells that are alive it also dies (over-population).
- So with two or three neighbors alive, a cell will stay alive as well.
- If a cell is dead and has exactly three neighbors that are alive it becomes alive itself.
These simple rules can result in fairly complex behavior on the grid. Getting complex patterns out of a set of simple rules is called emergent behavior.
Apparently it is even an universal turing machine, meaning the Game Of Life is an universial computer and can do anything any other digital computer can do if the grid is made large enough.
While interesting from a theoretical perspective, the reason for making the FPGA implementation was mostly for the interesting pictures it generates ;-)
|
|