Master Thesis — Sound and Music Computing — Universitat Pompeu Fabra

Neural Engine Sound Synthesis with Physics-Informed Inductive Biases and Differentiable Signal Processing

Robin Doerfler1 · Lonce Wyse2
1robin.doerfler01@estudiant.upf.edu, Universitat Pompeu Fabra, Barcelona, Spain  ·  2Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Figure 1 — Model Overview
PTR model overview diagram

Control features (RPM, torque) and their deltas are temporally embedded via MLP blocks and GRU. Outputs are decoded into synth parameters through MLP blocks, converted to parameter ranges by specialized heads (Pulse, Noise), upsampled to audio rate and scaled by conditioning signals. Audio is synthesized via differentiable modules with parameter updates calculated from multi-resolution spectral and harmonic losses.

Audio Examples

Three conditions are compared: Target — ground-truth engine recordings; HPN — harmonic-plus-noise baseline (same encoder–decoder structure but standard harmonic plus noise synthesis); and PTR (ours) — the proposed Pulse-Train-Resonator model. Spectrograms are shown as visual reference. Click ▶ to play, or click the waveform to seek.