Methodology

Analyzing Music and Emotions Through Audience Response

 %%| fig-cap: "Swimlane Diagram: Multimodal Music–Emotion Workflow — vertical lanes, all top-down"
flowchart TD

  %% --- Audience: Heatmap Tracking (top-down) ---
  subgraph AUD_TRACK["🎭 Heatmap Tracking"]
    direction TB
    A1["Track individual heatmaps"]
    A1b[(🗄️ Store in a DBMS)]
  end

  %% --- Audience: Heatmap Embedding (top-down) ---
  subgraph AUD_EMB["🎭 Heatmap Embedding"]
    direction TB
    A2["Compute heatmap embeddings"]
    A3["Extract denoised emotional signals"]
  end

  %% --- Audio (top-down) ---
  subgraph AUDIO["🎵 Audio Embedding"]
    direction TB
    B1["Model audio as stochastic process"]
    B2["Compute synchronized audio embeddings"]
  end

  %% --- Model (top-down) ---
  subgraph MODEL["🧠 Music <--> Emotions"]
    direction TB
    C1["Implement and train ANN"]
    C2["Evaluate causal influence"]
  end

  %% --- XAI (top-down) ---
  subgraph XAI["✨ Explainable AI Analysis"]
    direction TB
    D1["Apply XAI methods"]
    D2["Identify influential features"]
    D3["Interpret emotional influence"]
  end

  %% --- Vertical ordering (stacked lanes, top → down) ---
  %%AUD_TRACK --> AUD_EMB
  %%AUD_EMB --> AUDIO
  %%AUDIO --> MODEL
  %%MODEL --> XAI

  %% --- Internal top-down flows ---
  A1 --> A1b
  A1b --> A2
  A2 --> A3
  A3 --> C1
  B1 --> B2
  B2 --> C1
  C1 --> C2
  C2 --> D1
  D1 --> D2
  D2 --> D3

  %% --- Styling ---
  classDef audienceStyle fill:#F8FBFF,stroke:#0277bd,stroke-width:2px,color:#000
  classDef embStyle fill:#e8f4ff,stroke:#0277bd,stroke-width:2px,color:#000
  classDef audioStyle fill:#e8ffe8,stroke:#00796b,stroke-width:2px,color:#000
  classDef modelStyle fill:#fff9d6,stroke:#f57f17,stroke-width:2px,color:#000
  classDef xaiStyle fill:#f7e8ff,stroke:#6a1b9a,stroke-width:2px,color:#000
  classDef nodeStyle fill:#ffffff,stroke:#333,stroke-width:1px,color:#000


  class AUD_TRACK audienceStyle
  class AUD_EMB embStyle
  class AUDIO audioStyle
  class MODEL modelStyle
  class XAI xaiStyle
  class A1,A1b,A2,A3,B1,B2,C1,C2,D1,D2,D3 node

Audience Heatmap Tracking

Track individual heatmaps corresponding to individual faces in the audience during the opera
Calculate embeddings for each tracked heatmap representing stochastic processes
Extract stochastic processes that minimize noise while preserving emotional information related to emotions during listening

Audio Embedding

Calculate embeddings of the synchronized audio from the opera
Model the audio embedding as a stochastic process capturing musical features influencing emotions

Connecting Music and Emotions

Implement an Artificial Neural Network (ANN) to connect the stochastic processes from audience emotions and music
Train the ANN to predict emotions from music features and previous emotional states
Hypothesize causal influence of music on emotions based on prediction accuracy

Explainable AI Analysis

Apply explainable AI methods to the trained ANN
Identify crucial input features (music or previous emotions) that drive emotion prediction
Gain insights into how music influences emotional responses

Expected Outcomes

A robust model linking musical features to emotional reactions
Understanding of temporal dynamics of emotions during live music
Explainable insights into causal relationships between music and emotions

Future Work

Extend to other music genres and live settings
Incorporate physiological data for multimodal emotion analysis
Refine models with larger datasets and advanced AI techniques