CCN 2023 | Tutorial - Exploring a neural dataset

Summary

Analyzing a large-scale fMRI dataset containing neural responses to natural scenes reveals high-dimensional latent structure in neural responses described by a power-law covariance eigenspectrum.

Run this notebook interactively!

Here’s a link to this notebook on Google Colab.

The Natural Scenes fMRI Dataset

The natural scenes dataset (NSD) is the largest fMRI dataset on human vision, with 7T fMRI responses (1.8 mm isotropic voxels) obtained from 8 adult participants (Allen et al. 2021). The experiment involved a continuous recognition task while participants observed natural scene images from the Microsoft Common Objects in Context (COCO) database (Lin et al. 2014).

Let’s load the dataset. This data contains neural responses to 700 images from ~15,000 voxels reliably modulated by the visual stimuli during the NSD experiment.

Install all required dependencies

%pip install git+https://github.com/BonnerLab/ccn-tutorial.git

Import various libraries

from collections import Counter
import warnings

import numpy as np
import pandas as pd
import xarray as xr
from torchtext.datasets import IMDB
from torchtext.data.utils import get_tokenizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA, NMF

from IPython.display import display, HTML
from matplotlib.figure import Figure
from mpl_toolkits.axes_grid1 import ImageGrid
from matplotlib import pyplot as plt
from matplotlib_inline.backend_inline import set_matplotlib_formats
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import seaborn as sns

from utilities.brain import (
    load_dataset,
    average_data_across_repetitions,
    load_stimuli,
    plot_brain_map,
)

Set some visualization defaults

%matplotlib inline

sns.set_theme(
    context="notebook",
    style="white",
    palette="deep",
    rc={"legend.edgecolor": "None"},
)
set_matplotlib_formats("svg")

pd.set_option("display.max_rows", 5)
pd.set_option("display.max_columns", 10)
pd.set_option("display.precision", 3)
pd.set_option("display.show_dimensions", False)

xr.set_options(display_max_rows=3, display_expand_data=False)

warnings.filterwarnings("ignore")

Initialize a deterministic random number generator

random_state = 0
rng = np.random.default_rng(seed=random_state)

Load the dataset

data = average_data_across_repetitions(load_dataset(subject=0, roi="general"))

display(data)

Some fMRI preprocessing details

We utilized the NSD single-trial betas, preprocessed in 1.8 mm volumetric space and denoised using the GLMdenoise technique (version 3; “betas_fithrf_GLMdenoise_RR”). The betas were converted to Z-scores within each scanning session and averaged across repetitions for each stimulus.

Here are some examples of stimuli seen by the participants.

Load the stimuli

def view_stimuli(stimuli: xr.DataArray, *, n: int = 10) -> None:
    fig = plt.figure(figsize=(12, 4))
    image_grid = ImageGrid(
        fig=fig,
        rect=(1, 1, 1),
        nrows_ncols=(1, n),
        share_all=True,
    )
    for i_image in range(n):
        image_grid[i_image].imshow(stimuli[i_image])
        image_grid[i_image].axis("off")
    fig.show()


stimuli = load_stimuli()
view_stimuli(stimuli)

The neural covariance eigenspectrum

Now we can apply PCA to the neural responses and plot the eigenspectrum of their covariance!

Visualize the eigenspectrum

def view_eigenspectrum(pca: PCA, *, log: bool = False) -> None:
    eigenvalues = pd.DataFrame(pca.explained_variance_, columns=["eigenvalue"]).assign(
        rank=1 + np.arange(pca.n_components_)
    )

    fig, ax = plt.subplots(figsize=(6, 6))
    sns.lineplot(
        ax=ax,
        data=eigenvalues.loc[eigenvalues["rank"] < pca.n_components_],
        x="rank",
        y="eigenvalue",
    )
    sns.despine(ax=ax)
    if log:
        ax.set_xscale("log")
        ax.set_yscale("log")
        ax.set_ylim(bottom=1)
    fig.show()


pca = PCA()
pca.fit(data)

view_eigenspectrum(pca)

Neural covariance eigenspectrum with linear scaling

Visualization of principal components

There are some simple ways to visualize and interpret the principal components.

Utilities to view images

def view_images_as_scatterplot(
    x: np.ndarray, y: np.ndarray, *, stimuli: xr.DataArray
) -> None:
    fig, ax = plt.subplots(figsize=(10, 10))
    for i_stimulus in range(len(stimuli)):
        image_box = OffsetImage(stimuli[i_stimulus].values, zoom=0.3)
        image_box.image.axes = ax

        ab = AnnotationBbox(
            image_box,
            xy=(x[i_stimulus], y[i_stimulus]),
            xycoords="data",
            frameon=False,
            pad=0,
        )
        ax.add_artist(ab)

    ax.set_xlim([x.min(), x.max()])
    ax.set_ylim([y.min(), y.max()])
    ax.axis("off")
    fig.show()


def view_images_at_poles(
    x: np.ndarray,
    *,
    stimuli: xr.DataArray,
    n_images_per_pole: int = 5,
    label: str | None = None,
) -> None:
    indices = np.argsort(x, axis=0)

    fig = plt.figure(figsize=(12, 4))
    image_grid = ImageGrid(
        fig=fig,
        rect=(1, 1, 1),
        nrows_ncols=(1, 2 * n_images_per_pole + 1),
        share_all=True,
    )
    for i_image in range(n_images_per_pole):
        image_grid[i_image].imshow(stimuli[indices[i_image]])
        image_grid[i_image].axis("off")
        image_grid[-i_image - 1].imshow(stimuli[indices[-i_image - 1]])
        image_grid[-i_image - 1].axis("off")

    for ax in image_grid:
        ax.axis("off")

    if label is not None:
        ax = image_grid[n_images_per_pole]
        ax.text(
            0.5,
            0.5,
            label,
            horizontalalignment="center",
            verticalalignment="center",
            transform=ax.transAxes,
        )
    fig.show()

The first method is to plot the stimuli on a scatter plot, designating their X and Y coordinates to be their scores along two principal components of interest. This allows us to observe potential clustering of the stimuli.

Project the neural data onto the first two principal components

data_pca = pca.transform(data)
view_images_as_scatterplot(data_pca[:, 0], data_pca[:, 1], stimuli=stimuli)

Alternatively, we can focus on the stimuli with the highest or lowest scores along a given principal component. This provides simple clues of what this PC might be sensitive to, which could be visual features ranging from low to high complexity.

View images that have extreme scores on the PCs

for rank in [1, 2, 3, 10, 50, 100]:
    view_images_at_poles(data_pca[:, rank - 1], stimuli=stimuli, label=f"rank {rank}")

What about other methods?

Interpreting PCs can be challenging especially when we rely solely on visual inspection. This difficulty arises in part because many natural features are non-negative. As a result, methods like nonnegative matrix factorization (NMF) often offer more interpretable dimensions than PCA.

Compute NMF of neural responses

scaler = MinMaxScaler()
scaler.fit(data)

nmf = NMF(n_components=2, random_state=random_state)
data_nmf = nmf.fit_transform(scaler.transform(data))

view_images_as_scatterplot(data_nmf[:, 0], data_nmf[:, 1], stimuli=stimuli)

Similarly, we can inspect the stimuli with highest or closest-to-zero values along each dimension.

View images that have extreme scores on the dimensions

for dim in range(2):
    view_images_at_poles(data_pca[:, dim], stimuli=stimuli, label=f"dim {dim+1}")

Nonetheless, PCA has unique benefits that shouldn’t be overlooked. For instance, PCA offers closed-form solutions and non-stochastic outcomes. They’re also well characterized mathematically. Moreover, because PCA is essentially a simple rotation of the data, it preserves all the original information in the dataset.

On this plot, it looks like that the first few PCs have substantial variance while the rest are negligible, which suggests a low-dimensional structure.

However, when dealing with high-dimensional data that span several orders of magnitude, it’s more insightful to visualize it on a logarithmic scale, which makes many key statistical trends more apparent. Let’s try visualizing the spectrum on a logarithmic scale for both axes:

Visualize the eigenspectrum on a logarithmic scale

view_eigenspectrum(pca, log=True)

Neural covariance eigenspectrum with logarithmic scaling

On a log-log scale, the spectrum shows a trend that looks remarkably linear, suggesting that the eigenspectrum might obey a power-law distribution:

\begin{align*} \log{\lambda_\text{rank}} &\approx \alpha \log{\left( \text{rank} \right)} + c\\ \lambda_\text{rank} &\propto \left( \text{rank} \right)^\alpha \end{align*}

There appears to be no obvious cut-off point in this power law, suggesting that there might be information across all ranks. The number of effective dimensions here is likely much higher than what we would have expected from simply viewing the eigenspectrum on a linear scale.

Power laws

A power law is a relationship of the form f(x) \propto x^{\alpha}, where \alpha is termed the index of the power law, or the power law exponent. It suggests a scale-free structure because f(kx) \propto f(x).

Power laws are ubiquitious in nature, arising in all sorts of systems:

word frequencies in natural language (Zipf’s law)
wealth distribution (Pareto principle)
ferromagnetic materials near the Curie temperature (Ising model)

Nevertheless, a power law relationship will not be observed when the data

is random, or
when it has a characteristic scale.

An analogy: word frequencies

Zipf’s law suggests a power-law distribution of use frequency of English words. Let’s compute the distribution of frequencies in a small corpus – a collection of IMDb movie reviews.

Generate word frequency distribution

dataset = IMDB(split="test")
tokenizer = get_tokenizer("basic_english", language="en")
unwanted_tokens = {".", ",", "?", "s", "t", "(", ")", "'", "!"}

counter = Counter()
for _, text in dataset:
    counter.update(tokenizer(text))

View word frequency distribution on a linear scale

def view_word_frequency_distribution(counter: Counter, *, log: bool = False) -> Figure:
    fig, ax = plt.subplots()
    sns.lineplot(
        ax=ax,
        data=pd.DataFrame(
            {
                "rank": 1 + np.arange(len(counter)),
                "word frequency": sorted(counter.values())[::-1],
            }
        ),
        x="rank",
        y="word frequency",
    )
    if log:
        ax.set_xscale("log")
        ax.set_yscale("log")

    sns.despine(ax=ax, offset=20)
    plt.close(fig)
    return fig


view_word_frequency_distribution(counter, log=True)

If only the high-frequency words are meaningful to human language, then we should be able to reconstruct a movie review with these words only:

Display an IMDb review

for _, line in dataset:
    tokens = tokenizer(line)
    break


def postprocess(text: str):
    new_text = ""

    for i_char, char in enumerate(text):
        if i_char != 0:
            prev_char = text[i_char - 1]
        else:
            prev_char = None
        if i_char > 1:
            prev2_char = text[i_char - 2]
        else:
            prev2_char = None
        if i_char != len(text) - 1:
            next_char = text[i_char + 1]
        else:
            next_char = None

        if char == "i" and prev_char == " " and next_char == " ":
            new_text += "I"
        elif char == " " and (
            next_char == "."
            or next_char == ","
            or next_char == "'"
            or prev_char == "'"
            or prev_char == "("
            or next_char == ")"
            or next_char == "!"
        ):
            continue
        elif prev_char == " " and (prev2_char == "." or prev2_char == "!"):
            new_text += char.upper()
        else:
            new_text += char
    return new_text


print("An IMDb review")
display(HTML(f"<blockquote>{postprocess(' '.join(tokens))}</blockquote>"))

An IMDb review

i love sci-fi and am willing to put up with a lot. Sci-fi movies/tv are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good tv sci-fi as babylon 5 is to star trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, cg that doesn't match the background, and painfully one-dimensional characters cannot be overcome with a'sci-fi'setting. (I'm sure there are those of you out there who think babylon 5 is good sci-fi tv. It's not. It's clichéd and uninspiring.) while us viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star trek). It may treat important issues, yet not as a serious philosophy. It's really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of earth know it's rubbish as they have to always say gene roddenberry's earth... Otherwise people would not continue watching. Roddenberry's ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. Jeeez! Dallas all over again.

Regenerate the review using the top 200 words

print("Review regenerated using only the top 200 words")
display(
    HTML(
        (
            "<blockquote>"
            + postprocess(
                " ".join(
                    [
                        token
                        for token in tokens
                        if token in [word for word, _ in counter.most_common(200)]
                    ]
                )
            )
            + "</blockquote>"
        )
    )
)

Review regenerated using only the top 200 words

i love and to up with a lot. Are, and. I to like this, I really did, but it is to good as is to (the).,,, that doesn't the, and characters be with a''. (I'm there are those of you out there who think is good. It's not. It's and.) while us like and character, is a that does not take (.). It, not as a. It's really to about the characters here as they are not, just a of life. Their and are and, to watch. The of know it's as they have to say's... People would not watching.'s be in their as this,, (watching it really this) of a show into.. So, off a character. And then him back as another.! All over again.

This poor reconstruction demonstrates that the high-rank, low-frequency words also carry meaningful information – in fact, most of it. Analogously, if we try to reconstruct neural data with just a few high-variance principal components – which is exactly what typical dimensionality reduction methods do – we will likely lose valuable information about the presented stimulus.

References

Allen, Emily J., Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, et al. 2021. “A Massive 7T fMRI Dataset to Bridge Cognitive Neuroscience and Artificial Intelligence.” Nature Neuroscience 25 (1): 116–26. https://doi.org/10.1038/s41593-021-00962-x.

Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. “Microsoft COCO: Common Objects in Context.” arXiv. https://doi.org/10.48550/ARXIV.1405.0312.