Research SightSound-R1 Cross-modal reasoning distillation from vision to audio-language models. Layer-wise Minimal Pair Probing Revealing grammatical and conceptual hierarchies inside speech representations. Open Source ChatVITS Cyberpunk 2077 Creative voice synthesis combining GPT dialogue planning with VITS voice cloning.