A neural audio style transfer project for generating machine-like creature voices for Lies of P.
This project explored neural audio style transfer for designing mechanical creature voices in the world of Lies of P. The central challenge was that the monsters were intended to sound machine-like rather than organic, yet raw mechanical recordings lacked the timing, force, and expressive motion needed for believable creature performances.
Instead of relying only on conventional sound design techniques such as filtering, layering, and manual editing, I developed a neural audio filter that transformed monster vocal samples into mechanical textures while retaining the original performance cues that made them feel alive.
The system was built on a non-parallel spectrogram-based audio translation framework and adapted for game audio rather than standard voice conversion.
My main focus was not simply transferring texture, but preserving the dynamics and accent structure of the source signal so that the output would still feel like an intentional creature performance rather than a layer of generic machine noise.
To support broader artistic variation, I also organized the reference mechanical sound library into a small number of stylistic groups, allowing the team to explore multiple output directions with different machine-like characteristics.
- Adapted a non-parallel spectrogram translation approach for mechanical creature sound design in a production setting.
- Modified the training objective to better preserve the temporal dynamics and expressive contour of the original vocal input during style transfer.
- Structured the reference sound library into multiple stylistic clusters to generate different categories of mechanical sound candidates rather than a single uniform output.
This project turned a difficult manual sound design problem into a reusable generative tool for creating stylized sound samples.
The generated outputs were used as candidate assets within the audio production workflow of Lies of P, helping the team produce mechanical creature sounds more efficiently while maintaining consistency with the game's worldbuilding and tone.
A key lesson from this project was that perceptually convincing audio transfer depended less on matching surface texture alone and more on preserving the motion and expressive structure of the original source signal.