Welcome, dear readers!

In our previous article, we discussed TrueNorth — IBM's neuromorphic processor that proved the feasibility of energy-efficient data processing based on the principles of biological neural networks. However, the journey of neuromorphic computing doesn't end there. Other projects offer alternative approaches to creating a "computational brain." One such project is SpiNNaker — a scalable platform that combines parallelism with an event-driven signal processing paradigm. In this article, we’ll explore the evolution of SpiNNaker — from its early concepts to the modern SpiNNaker2 — and analyze the key architectural features that make this platform unique among neuromorphic systems.

SpiNNaker (Spiking Neural Network Architecture) is a massively parallel computing architecture inspired by the working principles of the human brain. The system consists of billions of simple computational elements that communicate using biologically inspired impulses (spikes), enabling real-time simulation of spiking neural networks (SNN).

Project Goals:

  • To create a high-performance, massively parallel computing platform suitable for real-time simulation of large-scale neural networks, including sections of the human brain.

  • To explore new principles in computer architecture that could revolutionize the design of energy-efficient and high-performance computing systems.

Work on the first SpiNNaker prototypes began in the late 1990s at the University of Manchester under the leadership of Professor Steve Furber. Years of research culminated in the development of the second-generation system — SpiNNaker2 — which builds upon the original principles while significantly expanding the platform's capabilities. Let’s dive into the major milestones of this journey:

Early Concept of SpiNNaker (Late 1990s – Early 2000s)

The idea of SpiNNaker emerged as a hardware platform aimed at mimicking biological neural processes in real time. Professor Steve Furber and his colleagues at the University of Manchester leveraged their experience in developing ARM architecture to explore spiking neural networks (SNN) and methods for their implementation on specialized neuromorphic hardware. In response to the growing demand for more energy-efficient and biologically plausible models of brain activity, the team conducted experiments with small-scale neural networks on general-purpose CPUs. These experiments demonstrated that "neuron-like" processing not only could but should be offloaded to specialized architectures to achieve the high performance and realistic parallelism required for brain modeling.

First SpiNNaker Prototypes (Mid-2000s – Around 2009)

The next stage saw the introduction of the first ARM-based SpiNNaker chips, equipped with specialized spike-routing systems. The initial version of SpiNNaker was manufactured using a 28nm process. The primary goal was to test the GALS (Globally Asynchronous, Locally Synchronous) architecture and the Address Event Representation (AER) protocol. GALS enabled efficient synchronization of local regions within the system without the complexity of global clock signals, while AER facilitated the transmission of "spike events" with minimal latency. Special attention was given to spike transmission quality — measuring delays, assessing stability under high data loads, and ensuring reliability under parallel workloads. To simplify system management and data analysis, the team developed visualization and modeling tools, allowing researchers to track spike activity in real time and configure network structures with ease.

Scaling Up and Joining the Human Brain Project (Late 2000s – Early 2010s)

As the project progressed, developers increased the number of processors per chip and interconnected them into multi-board clusters, forming larger fragments of "artificial neural tissue." One of the pivotal moments in the project was its collaboration with the European Human Brain Project (HBP), aimed at biologically plausible modeling of large brain regions. This partnership accelerated the evolution of SpiNNaker, leading to the development of high-level software libraries (such as sPyNNaker) that simplified neural network design and automated task distribution across processor cores.

Transition to a Fully Functional System (2013–2016)

During this phase, hardware clusters containing dozens of chips and thousands of ARM cores were built, enabling the simulation of far more complex neural structures. The enhanced computational power opened new possibilities, not only in neurobiology but also in robotics, machine learning, and real-time sensor data analysis. The availability of user-friendly tools encouraged a broader range of researchers to adopt SpiNNaker, integrating it into projects focused on neuronal plasticity, learning algorithms, and distributed computing.

Refinement and Expansion of Applications (2017 – Present)

The introduction of SpiNNaker2 marked a significant leap forward. The new version features an increased number of cores, built-in hardware accelerators, and advanced power management mechanisms (such as Adaptive Body Biasing — ABB and Dynamic Voltage and Frequency Scaling — DVFS), which we’ll discuss in more detail later. These improvements have made the system significantly more efficient. In research, SpiNNaker2 excels in hybrid modes, supporting both SNN and DNN architectures and broadening its range of applications — from biomedical simulations to smart city systems. Its ability to adapt to uncertainty and dynamic environments highlights the practical value of its event-driven architecture.

SpiNNaker2 Hardware Architecture

SpiNNaker2 embodies the principle of scalable neuromorphic computing — from simple processing nodes to supercomputing systems capable of simulating millions of biological neurons in real time. One of the significant technological advancements was the transition from a 28nm process (used in the first SpiNNaker version) to a 22nm FD-SOI process, which enhanced integration density and improved energy efficiency. Key features include Adaptive Body Biasing (ABB) and Dynamic Voltage and Frequency Scaling (DVFS), providing an optimal balance between performance and power consumption.

1. Processing Element (PE)
At the core of the system is the Processing Element (PE) — a specialized unit responsible for local computations and neural dynamics modeling. Each PE includes:

  • ARM Cortex-M4F core with an integrated floating-point unit, offering flexibility for implementing algorithms and simulating neural processes.

  • Hardware MAC array to accelerate multiply-accumulate operations, critical for running convolutional layers and matrix multiplications in both SNNs and traditional deep neural networks (DNNs).

  • Exponential and logarithmic computation modules for nonlinear functions and modeling synaptic plasticity.

  • Random Number Generators (RNGs) — both pseudo and true — supporting stochastic processes and regularization algorithms.

  • 128 KB of on-chip SRAM, divided into multiple banks to minimize data access conflicts and accelerate event processing.
    This modular setup allows each PE to emulate thousands of biological neurons while maintaining high energy efficiency.

image Fig.1. Processing Element (PE)

2. Quad Processing Element (QPE)

To enhance communication efficiency and streamline scalability, four PEs are grouped into a Quad Processing Element (QPE) cluster. Each QPE features:

  • Four interconnected PEs sharing a local router for rapid data exchange.

  • A local GALS-based network, enabling each PE to operate within its own clock domain (adjusted via DVFS) while synchronizing asynchronously with neighboring units through dedicated bridges.

This architecture reduces spike transmission delays and optimizes workload distribution, allowing real-time energy adaptation based on computational demands.

image Fig.2. Quad Processing Element (QPE) architecture

3. SpiNNaker2 Chip

At the chip level, multiple QPEs are integrated to form a high-performance neuromorphic system. Each chip houses 38 QPEs, totaling 152 PEs, providing significant computational density within a compact footprint. Key technologies include:

  • Adaptive Body Biasing (ABB): Dynamically adjusts transistor thresholds for near-threshold operation, boosting performance by up to 10× at around 0.50V.

  • Dynamic Voltage and Frequency Scaling (DVFS): Automatically adjusts core frequencies and supply voltages based on workload demands (e.g., PL1: 0.50V/100–200 MHz and PL3: 0.60V/400 MHz), achieving up to 60% energy savings.

  • Network-on-Chip (NoC): A dual-channel NoC architecture — Data NoC (DNoC) for neuron spikes and DMA transfers, and Configuration NoC (CNoC) for control messages — ensures efficient data flow. The Address Event Representation (AER) protocol further enhances spike routing efficiency.

Parallel routing at the chip level enables efficient message passing, even under high event loads, while the GALS architecture eliminates the need for a global clock.

image Fig.3. SpiNNaker2 chip topology

4. Evaluation Board

SpiNNaker2 chips are mounted on specialized evaluation boards that bridge individual chips to larger-scale systems. These boards feature:

  • External DRAM connectivity for storing large datasets, weight matrices, and intermediate results.

  • Standard interfaces such as UART, SPI, I2C, JTAG, and Ethernet for easy configuration, debugging, and system integration.

  • Modular design allowing multiple SpiNNaker2 chips (up to six or more) to be installed on a single board, facilitating the construction of large-scale computing clusters.

5. SpiNNaker2-Based Supercomputing Systems

Thanks to its modular and scalable architecture, SpiNNaker2 boards can be interconnected to form supercomputing systems:

  • Scalability to millions of cores: Current setups include up to 5 million processing cores, with future designs aiming for 10 million, enabling detailed simulations of sensory circuits, cortical regions, and even entire brain sections.

  • Energy efficiency and performance: ABB, DVFS, and specialized hardware accelerators (MAC arrays, exponential/logarithmic units, RNGs) offer substantial performance gains while significantly reducing power consumption.

  • Asynchronous inter-board communication: GALS-based NoC enables efficient, asynchronous communication between boards, simplifying system scaling without complex global synchronization.

This concludes Part I of our deep dive into the SpiNNaker platform. The sheer volume of material on this groundbreaking project has led us to split the article into two parts. In the next installment, we will focus on SpiNNaker2’s software ecosystem, explore the tools supporting this architecture, compare SpiNNaker2 to IBM’s TrueNorth, and discuss key projects leveraging SpiNNaker2 in research and applied domains.

Stay tuned and see you soon in our blog!

Thank you for being with us!
Sincerely, the MemriLab team.

Sources:

  1. The SpiNNaker2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing

  2. SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning

  3. Efficient Reward-Based Structural Plasticity on a SpiNNaker 2 Prototype

  4. The SpiNNaker Project

  5. SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation