AES 2026 International Conference on Automotive Audio — Detroit, MI — July 29–31, 2026

Gradient-Based Learning of Parametric Engine Sound Representations for Real-Time Resynthesis and Tuning on Embedded Systems

Robin Doerfler · Matthieu Kuntz · Clemens Zimmer
Figure 1 — Model Overview
Model overview diagram

Input audio is provided as VQT spectrograms and encoded to a compact latent timbre representation, before being decoded into RPM and torque gain curves that form the shared parametrization for end-to-end training and direct DSP export at inference. Gain curves are projected onto temporal soft-masks to yield time-varying amplitude envelopes, which drive a differentiable harmonic synthesizer (f₀ derived from the RPM trajectory) and an ERB noise bank. Training minimizes a combined multi-resolution STFT and harmonic loss against the target audio.

Audio Examples

Stimuli comprise three conditions: Target — ground-truth engine recordings; EONE (ours) — full reconstructions from the proposed model; and EOE — truncated reconstructions retaining only the 36 lowest harmonics, omitting broadband components. EOE represents a baseline for conventional automotive sound design. Spectrograms are shown as a visual reference. Click ▶ to play, or click the waveform to seek.