BNMusic: Blending Environmental Noises into Personalized Music

Chi Zuo1, Martin B. Møller2, Pablo Martínez-Nuevo2, Huayang Huang1, Yu Wu1, *, Ye Zhu3,4,
1School of Computer Science, Wuhan University, China,
2Bang & Olufsen A/S, Denmark,
3Department of Computer Science, Princeton University, USA,
4LIX, École Polytechnique, IP Paris, France,
*Corresponding author,

Overall pipeline of our proposed BNMusic framework to achieve noise blending with frozen music generators. The two stages of our approach are marked with different background colors. In Stage 1, our approach generates music that aligns with the noise, and in Stage 2 we adaptive amplify the music signal to reach the most ideal and reasonable blending with the noise.

Abstract

While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other dominant yet less intrusive sounds. However, misalignment between the dominant sound and the noise—such as mismatched downbeats—often requires an excessive volume increase to achieve effective masking. Motivated by recent advances in cross-modal generation, in this work, we introduce an alternative method to acoustic masking, aiming to reduce the noticeability of environmental noises by blending them into personalized music generated based on user-provided text prompts. Following the paradigm of music generation using mel-spectrogram representations, we propose a Blending Noises into Personalized Music (BNMusic) framework with two key stages. The first stage synthesizes a complete piece of music in a mel-spectrogram representation that encapsulates the musical essence of the noise. In the second stage, we adaptively amplify the generated music segment to further reduce noise perception and enhance the blending effectiveness, while preserving auditory quality. Our experiments with comprehensive evaluations on MusicBench, EPIC-SOUNDS, and ESC-50 demonstrate the effectiveness of our framework, highlighting the ability to blend environmental noise with rhythmically aligned, adaptively amplified, and enjoyable music segments, minimizing the noticeability of the noise, thereby improving overall acoustic experiences.

Video

Here is a demonstration of samples generated by our BNMusic model, showcasing its performance under nine different combinations of noise conditions and text prompts.

BibTeX

@inproceedings{czuo2025bnmusic,
  author    = {Chi Zuo and Martin B. Møller and Pablo Martínez-Nuevo and Huayang Huang and Yu Wu and Ye Zhu},
  title     = {BNMusic: Blending Environmental Noises into Personalized Music},
  booktitle = {Proceedings of the Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS 2025)},
  year      = {2025},
  url       = {https://arxiv.org/abs/2506.10754}
}