We introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations.
ICLR2025 (Oral)
Synchronization is ubiquitous in nature and is a key mechanism for information processing in the brain. A famous model is called Kuramoto model, which is a simple model that describes the synchronization of oscillators. We generalize and incorpolate the Kuramoto model into the neuronal dynamics.
Our generalized Kuramoto oscillatory neuron model.
We use a multi-dimensional vector version of the Kuramoto model with a symmetry-breaking term. Oscillators are denoted by ${\bf X}=\{\textbf{x}_i\}_{i=1}^C$, where each ${\bf x}_i$ moves on the $N-1$ sphere: ${\bf x}_i \in \mathbb{R}^{N},~\|{\bf x}_i\|_2=1$. $N$ is each single oscillator dimension and $C$ is the number of oscillators. The differential equation of our vector-valued Kuramoto model is:
$$ \dot{\bf x}_i = {\bf \Omega}{\bf x}_i + {\rm Proj}_{\textbf{x}_i}({\bf c}_i + \sum_{j=1}^{N} {\bf J}_{ij} {\bf x}_j )~{\rm where}~{\rm Proj}_{\textbf{x}_i}({\bf y}_i) = {\bf y}_i - \langle {\bf y}_i {\bf x}_i \rangle {\bf x}_i $$
Here, ${\bf \Omega}_i$ is an $N\times N$ anti-symmetric matrix and ${\bf \Omega}_i {\bf x}_i$ is the natural frequency term that determines each oscillator's own rotation frequency and angle. The second term governs interactions between oscillators, where ${\rm Proj}_{\textbf{ x}_i}$ is an operator that projects an input vector onto the tangent space of the sphere at $\textbf{x}_i$.
$\textbf{C}=\{\textbf{c}_i\}_{i=1}^C, \textbf{c}_i \in \mathbb{R}^{N}$ is a data-dependent variable, which is computed from the observational input or the activations of the previous layer. $\textbf{c}_i$ can be seen as another oscillator that has a unidirectional connection to $\textbf{x}_i$. Since $\textbf{c}_i$ is not affected by any oscillators, $\textbf{c}_i$ strongly binds $\textbf{x}_i$ to the same direction as $\textbf{c}_i$, i.e. it acts as a bias direction. In physics lingo, $\textbf{C}$ is often referred to as a "symmetry breaking" field.
We utilize the artificial Kuramoto oscillator neurons (AKOrN) as a basic unit of information processing in neural networks (Figure below). First, we transform an observation with a relatively simple function to create the initial conditional stimuli ${\bf C}^{(0)}$. Next, ${\bf X}^{(0)}$ is initialized typically by random vectors on the sphere. The block is composed of two modules: the Kuramoto layer and the readout module, which together process the pair $\{\textbf{X}, \textbf{C}\}$. The Kuramoto layer updates ${\bf X}$ with the conditional stimuli ${\bf C}$, and the readout layer extracts features from the final oscillatory states to create new conditional stimuli.
A network with AKOrN for image processing. Each layer consists of a Kuramoto-layer and a readout module described in Sec 4. ${\bf C}^{(L)}$ is used to make the final prediction of our model.
Visualization of the oscillators at the first and third layer, trained with self-supervised learning on ImageNet. The shallow layer learns high-frequency and local features, while deeper layer generates global and low-frequency waves.
We test AKOrN models on unsupervised object discovery, reasoning tasks(Sudoku), and image classification. We see that AKOrN strongly binds object features with competitive performance to slot-based models in object discovery, enhances the reasoning capability of self-attention, and increases robustness against random, adversarial, and natural perturbations with surprisingly good calibration.
Here we would like to share some observations that we found intersting and useful for future research.
The original AKOrN Sudoku reasoning model has 18% accuracy on OOD boards, but it bumps up to ~90% by increasing test-time compute. There two ways to improve the performance: 1. test-time extention of Kuramoto steps and 2. Energy-based voting. The test-time extention of Kuramoto steps can be done by iterating more Kuramoto steps for difficult boards and improves the performance from 18% to 51%. The energy-based voting can be done by selecting lowest-energy oscillators as the final prediction and improves the performance from 51% to 90%. These results imply that the Kuramoto layer behaves like energy-based models, even though its parameters are optimized solely based on the task objective.
We found that large oscillator dimensions lose the ability to bind features and to reason robustly. We can't idenitfy those abilities by looking at the training error; Increasing the oscillator dimensions does not harm the training process, but the performance unique to AKOrN in every task we tested decreases.