Exploring a neural dataset

Revealing the latent structure of fMRI responses to natural scenes


August 26, 2023


March 10, 2024

Analyzing a large-scale fMRI dataset containing neural responses to natural scenes reveals high-dimensional latent structure in neural responses described by a power-law covariance eigenspectrum.
The Natural Scenes fMRI Dataset

The natural scenes dataset (NSD) is the largest fMRI dataset on human vision, with 7T fMRI responses (1.8 mm isotropic voxels) obtained from 8 adult participants (Allen et al. 2021). The experiment involved a continuous recognition task while participants observed natural scene images from the Microsoft Common Objects in Context (COCO) database (Lin et al. 2014).

Let’s load the dataset. This data contains neural responses to 700 images from ~15,000 voxels reliably modulated by the visual stimuli during the NSD experiment.

Load the dataset
data = average_data_across_repetitions(load_dataset(subject=0, roi="general"))

<xarray.DataArray 'fMRI betas' (presentation: 700, neuroid: 15724)> Size: 44MB
0.4915 0.2473 0.08592 0.05828 -0.1315 ... -0.2126 -0.6315 -0.5751 -0.5354
Coordinates: (3/4)
    x            (neuroid) uint8 16kB 12 12 12 12 12 12 12 ... 72 72 72 72 72 72
    y            (neuroid) uint8 16kB 21 22 22 22 22 22 23 ... 29 29 30 30 30 31
    ...           ...
    stimulus_id  (presentation) object 6kB 'image02950' ... 'image72948'
Dimensions without coordinates: presentation, neuroid
Attributes: (3/8)
    resolution:      1pt8mm
    preprocessing:   fithrf_GLMdenoise_RR
    ...              ...
    postprocessing:  averaged across first two repetitions
Some fMRI preprocessing details

We utilized the NSD single-trial betas, preprocessed in 1.8 mm volumetric space and denoised using the GLMdenoise technique (version 3; “betas_fithrf_GLMdenoise_RR”). The betas were converted to Z-scores within each scanning session and averaged across repetitions for each stimulus.

Here are some examples of stimuli seen by the participants.

Load the stimuli
stimuli = load_stimuli()

The neural covariance eigenspectrum

Now we can apply PCA to the neural responses and plot the eigenspectrum of their covariance!

Visualize the eigenspectrum
pca = PCA()


Neural covariance eigenspectrum with linear scaling

There are some simple ways to visualize and interpret the principal components.

The first method is to plot the stimuli on a scatter plot, designating their X and Y coordinates to be their scores along two principal components of interest. This allows us to observe potential clustering of the stimuli.

Alternatively, we can focus on the stimuli with the highest or lowest scores along a given principal component. This provides simple clues of what this PC might be sensitive to, which could be visual features ranging from low to high complexity.

Interpreting PCs can be challenging especially when we rely solely on visual inspection. This difficulty arises in part because many natural features are non-negative. As a result, methods like nonnegative matrix factorization (NMF) often offer more interpretable dimensions than PCA.

Similarly, we can inspect the stimuli with highest or closest-to-zero values along each dimension.

Nonetheless, PCA has unique benefits that shouldn’t be overlooked. For instance, PCA offers closed-form solutions and non-stochastic outcomes. They’re also well characterized mathematically. Moreover, because PCA is essentially a simple rotation of the data, it preserves all the original information in the dataset.

On this plot, it looks like that the first few PCs have substantial variance while the rest are negligible, which suggests a low-dimensional structure.

However, when dealing with high-dimensional data that span several orders of magnitude, it’s more insightful to visualize it on a logarithmic scale, which makes many key statistical trends more apparent. Let’s try visualizing the spectrum on a logarithmic scale for both axes:

Neural covariance eigenspectrum with logarithmic scaling

On a log-log scale, the spectrum shows a trend that looks remarkably linear, suggesting that the eigenspectrum might obey a power-law distribution:

\begin{align*} \log{\lambda_\text{rank}} &\approx \alpha \log{\left( \text{rank} \right)} + c\\ \lambda_\text{rank} &\propto \left( \text{rank} \right)^\alpha \end{align*}

There appears to be no obvious cut-off point in this power law, suggesting that there might be information across all ranks. The number of effective dimensions here is likely much higher than what we would have expected from simply viewing the eigenspectrum on a linear scale.

Power laws

A power law is a relationship of the form f(x) \propto x^{\alpha}, where \alpha is termed the index of the power law, or the power law exponent. It suggests a scale-free structure because f(kx) \propto f(x).

Power laws are ubiquitious in nature, arising in all sorts of systems:

Nevertheless, a power law relationship will not be observed when the data

  • is random, or
  • when it has a characteristic scale.

An analogy: word frequencies

Zipf’s law suggests a power-law distribution of use frequency of English words. Let’s compute the distribution of frequencies in a small corpus – a collection of IMDb movie reviews.

view_word_frequency_distribution(counter, log=True)

If only the high-frequency words are meaningful to human language, then we should be able to reconstruct a movie review with these words only:

print("An IMDb review")
display(HTML(f"<blockquote>{postprocess(' '.join(tokens))}</blockquote>"))
An IMDb review
i love sci-fi and am willing to put up with a lot. Sci-fi movies/tv are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good tv sci-fi as babylon 5 is to star trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, cg that doesn't match the background, and painfully one-dimensional characters cannot be overcome with a'sci-fi'setting. (I'm sure there are those of you out there who think babylon 5 is good sci-fi tv. It's not. It's clichéd and uninspiring.) while us viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star trek). It may treat important issues, yet not as a serious philosophy. It's really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of earth know it's rubbish as they have to always say gene roddenberry's earth... Otherwise people would not continue watching. Roddenberry's ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. Jeeez! Dallas all over again.
This poor reconstruction demonstrates that the high-rank, low-frequency words also carry meaningful information – in fact, most of it. Analogously, if we try to reconstruct neural data with just a few high-variance principal components – which is exactly what typical dimensionality reduction methods do – we will likely lose valuable information about the presented stimulus.

