Artificial Kuramoto Oscillatory Neurons

Takeru Miyato1      Sindy Löwe2      Andreas Geiger1      Max Welling2     

1 University of Tübingen, Tübingen AI Center
2 University of Amsterdam

We introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations.

ICLR2025 (Oral)

Synchrony

Synchronization is ubiquitous in nature and is a key mechanism for information processing in the brain. A famous model is called Kuramoto model, which is a simple model that describes the synchronization of oscillators. We generalize and incorpolate the Kuramoto model into the neuronal dynamics.

Artificial Kuramoto Oscillatory Neurons (AKOrN)

Our generalized Kuramoto oscillatory neuron model.

We use a multi-dimensional vector version of the Kuramoto model with a symmetry-breaking term. Oscillators are denoted by ${\bf X}=\{\textbf{x}_i\}_{i=1}^C$, where each ${\bf x}_i$ moves on the $N-1$ sphere: ${\bf x}_i \in \mathbb{R}^{N},~\|{\bf x}_i\|_2=1$. $N$ is each single oscillator dimension and $C$ is the number of oscillators. The differential equation of our vector-valued Kuramoto model is:

$$ \dot{\bf x}_i = {\bf \Omega}{\bf x}_i + {\rm Proj}_{\textbf{x}_i}({\bf c}_i + \sum_{j=1}^{N} {\bf J}_{ij} {\bf x}_j )~{\rm where}~{\rm Proj}_{\textbf{x}_i}({\bf y}_i) = {\bf y}_i - \langle {\bf y}_i {\bf x}_i \rangle {\bf x}_i $$

Here, ${\bf \Omega}_i$ is an $N\times N$ anti-symmetric matrix and ${\bf \Omega}_i {\bf x}_i$ is the natural frequency term that determines each oscillator's own rotation frequency and angle. The second term governs interactions between oscillators, where ${\rm Proj}_{\textbf{ x}_i}$ is an operator that projects an input vector onto the tangent space of the sphere at $\textbf{x}_i$.

$\textbf{C}=\{\textbf{c}_i\}_{i=1}^C, \textbf{c}_i \in \mathbb{R}^{N}$ is a data-dependent variable, which is computed from the observational input or the activations of the previous layer. $\textbf{c}_i$ can be seen as another oscillator that has a unidirectional connection to $\textbf{x}_i$. Since $\textbf{c}_i$ is not affected by any oscillators, $\textbf{c}_i$ strongly binds $\textbf{x}_i$ to the same direction as $\textbf{c}_i$, i.e. it acts as a bias direction. In physics lingo, $\textbf{C}$ is often referred to as a "symmetry breaking" field.

Network with AKOrN

We utilize the artificial Kuramoto oscillator neurons (AKOrN) as a basic unit of information processing in neural networks (Figure below). First, we transform an observation with a relatively simple function to create the initial conditional stimuli ${\bf C}^{(0)}$. Next, ${\bf X}^{(0)}$ is initialized typically by random vectors on the sphere. The block is composed of two modules: the Kuramoto layer and the readout module, which together process the pair $\{\textbf{X}, \textbf{C}\}$. The Kuramoto layer updates ${\bf X}$ with the conditional stimuli ${\bf C}$, and the readout layer extracts features from the final oscillatory states to create new conditional stimuli.

A network with AKOrN for image processing. Each layer consists of a Kuramoto-layer and a readout module described in Sec 4. ${\bf C}^{(L)}$ is used to make the final prediction of our model.

Visualization of the oscillators at the first and third layer, trained with self-supervised learning on ImageNet. The shallow layer learns high-frequency and local features, while deeper layer generates global and low-frequency waves.

Results

We test AKOrN models on unsupervised object discovery, reasoning tasks(Sudoku), and image classification. We see that AKOrN strongly binds object features with competitive performance to slot-based models in object discovery, enhances the reasoning capability of self-attention, and increases robustness against random, adversarial, and natural perturbations with surprisingly good calibration.

PascalVOC COCO2017
Model $MBO_i$ $MBO_c$ $MBO_i$ $MBO_c$
Slot-attention 22.2 23.7 24.6 24.9
SLATE 35.9 41.5 29.1 33.6
DINOSAUR 44.0 51.2 31.6 39.7
Slot-diffusion 50.4 55.3 31.0 35.0
SPOT 48.3 55.6 35.0 44.7
AKOrN (Ours) 52.0 60.3 31.3 40.3

Unsupervised object discovery performance on natural images (PascalVOC and COCO). Other models are all slot-based models. AKOrN achieves the best performance on PascalVOC and the second best on COCO. Our work is the first work showing a model other than slot-based models achieves competitive performance on object discovery.

Observations

Here we would like to share some observations that we found intersting and useful for future research.

Energy value tells the confidence of the prediction

The original AKOrN Sudoku reasoning model has 18% accuracy on OOD boards, but it bumps up to ~90% by increasing test-time compute. There two ways to improve the performance: 1. test-time extention of Kuramoto steps and 2. Energy-based voting. The test-time extention of Kuramoto steps can be done by iterating more Kuramoto steps for difficult boards and improves the performance from 18% to 51%. The energy-based voting can be done by selecting lowest-energy oscillators as the final prediction and improves the performance from 51% to 90%. These results imply that the Kuramoto layer behaves like energy-based models, even though its parameters are optimized solely based on the task objective.

Visualization of the oscillators (middle) and the predictions (bottom) along the Kuramoto steps. While the model is not able to solve this board by 16 steps (which is #steps set during training), the model is able to do by 64 steps.

The use of large oscillator dimensions lose the ability to bind features and to reason robustly

We found that large oscillator dimensions lose the ability to bind features and to reason robustly. We can't idenitfy those abilities by looking at the training error; Increasing the oscillator dimensions does not harm the training process, but the performance unique to AKOrN in every task we tested decreases.

Sudoku reasoning performance vs oscillator dimensions. Only the models with $N=4, 8, 16$ get good performane, while $N=32$ and more get bad performance and do not even improve performance by the test-time extension of Kuramoto steps.