Visemes: Difference between revisions

From VRChat Wiki
ʙɪɢ․ (talk | contribs)
mNo edit summary
DAG-XR (talk | contribs)
Light proofread.
Line 1: Line 1:
'''In VRChat, “visemes” are the 15 mouth‑shape targets that the Oculus LipSync library drives while you speak.'''
{{Noticebox/Community}}{{Noticebox/Stub}}
'''Visemes''' (pormanteau for ''visual phonemes'') are an [[Avatars|avatar]] feature that mimics lip and mouth movement, when synchronized with human speech.
 
Within Unity, visemes are a type of, or set of shapekey(s) that can be programmed onto an avatar. In the Oculus LipSync Library, there are 15 mouth‑shape targets that can be used while you speak.
 
VRChat converts your live microphone audio through that library, converts the sound into a single integer (<code>0‑14</code>) every video frame, and writes it into the built‑in Animator parameter '''<code>Viseme</code> (often shown in the docs as "VisemeOculus")'''. Your avatar's FX layer (or the [[VRChat SDK|VRChat SDK's]] default layer) turns that number into blend‑shape or bone animation so the mouth appears to pronounce your words in real time.


VRChat pipes your live microphone audio through that library, converts the sound into a single integer (<code>0‑14</code>) every video frame, and writes it into the built‑in Animator parameter '''<code>Viseme</code> (often shown in the docs as “VisemeOculus”)'''. Your avatar’s FX layer (or the SDK’s default layer) turns that number into blend‑shape or bone animation so the mouth appears to pronounce your words in real time. See [https://creators.vrchat.com/avatars/creating-your-first-avatar/#viseme-blend-shape-recommended VRChat Creators Wiki]
{{Noticebox/Stub}}{{Noticebox/Community}}
----
=== Viseme Slots ===
=== Viseme Slots ===
{| class="wikitable"
{| class="wikitable"
Line 102: Line 104:
|Lips rounded, slightly forward
|Lips rounded, slightly forward
|}
|}
[https://developers.meta.com/horizon/blog/tech-note-enhancing-oculus-lipsync-with-deep-learning/?utm_source=chatgpt.com Oculus docs] use the longer spellings ''ih/oh/ou''; [https://creators.vrchat.com/avatars/animator-parameters/#viseme-values VRChat’s parameter list] trims them to ''i/o/u''.
[https://developers.meta.com/horizon/blog/tech-note-enhancing-oculus-lipsync-with-deep-learning/?utm_source=chatgpt.com Oculus Docs] use the longer spellings ''ih/oh/ou''; [https://creators.vrchat.com/avatars/animator-parameters/#viseme-values VRChat's parameter list] trims them to ''i/o/u''.
----


=== Wiring Avatar ===
=== Wiring Avatar ===
Line 114: Line 115:
#* Branch to custom mouth animations (e.g., a “big scream” version of <code>aa</code>) when volume is high by also reading the <code>Voice</code> float (0‑1).
#* Branch to custom mouth animations (e.g., a “big scream” version of <code>aa</code>) when volume is high by also reading the <code>Voice</code> float (0‑1).
# '''Performance tip:''' keep viseme blend‑shapes on a separate head mesh; the GPU only has to update the vertices it actually changes.
# '''Performance tip:''' keep viseme blend‑shapes on a separate head mesh; the GPU only has to update the vertices it actually changes.
----


=== Troubleshooting quick‑hits ===
=== Troubleshooting quick‑hits ===
Line 139: Line 138:
|In Build tab, mark head mesh for both PC & Android
|In Build tab, mark head mesh for both PC & Android
|}
|}
----
==== Take‑away ====
==== Take‑away ====
''Visemes are simply numbered mouth cues.'' Name 15 blend‑shapes (or equivalent bone poses) to match the Oculus set, point your Avatar Descriptor at them, and VRChat’s built‑in <code>Viseme</code> parameter will make your character lip‑sync automatically—no extra scripts required.
''Visemes are simply numbered mouth cues.'' Name 15 blend‑shapes (or equivalent bone poses) to match the Oculus set, point your Avatar Descriptor at them, and VRChat’s built‑in <code>Viseme</code> parameter will make your character lip‑sync automatically—no extra scripts required.


== See Also ==
== Resources ==
* [https://creators.vrchat.com/avatars/animator-parameters/#viseme-values Animation Parameters] on VRChat's Creator Documentation
* [https://developers.meta.com/horizon/documentation/unity/audio-ovrlipsync-viseme-reference Viseme Reference] on Meta Developers' Documentation


== See also ==
* [[Avatars]]
* [[Blendshapes]]
* [[Blendshapes]]
* [[VRChat Creator Companion]]
* [[VRChat Creator Companion]]
* [[Avatars]]
* [[VRChat SDK]]

Revision as of 22:58, 25 August 2025

V · ECommunity-written content
The following was created by the community. It may contain material not directly endorsed by the VRChat team. To learn more, consider reading Contributing to the VRChat Wiki.
V · EThis page is a stub.
You can help the VRChat Wiki by improving it.
[Reason: You can contribute by expanding and proofreading this article, in accordance with the Manual of Style.]

Visemes (pormanteau for visual phonemes) are an avatar feature that mimics lip and mouth movement, when synchronized with human speech.

Within Unity, visemes are a type of, or set of shapekey(s) that can be programmed onto an avatar. In the Oculus LipSync Library, there are 15 mouth‑shape targets that can be used while you speak.

VRChat converts your live microphone audio through that library, converts the sound into a single integer (0‑14) every video frame, and writes it into the built‑in Animator parameter Viseme (often shown in the docs as "VisemeOculus"). Your avatar's FX layer (or the VRChat SDK's default layer) turns that number into blend‑shape or bone animation so the mouth appears to pronounce your words in real time.

Viseme Slots

Index Name Phonemes Examples Mouth description
0 sil (silence) Neutral, lips relaxed
1 pp p, b, m put, bat, mat Lips fully closed, slight pout
2 ff f, v fat, vat Lower lip touches upper teeth
3 th th that Tongue between teeth
4 dd t, d tip, dip Tip of tongue touches ridge
5 kk k, g call, gas Back of tongue touches palate
6 ch ch, sh, ts, dz, j chair, she, its, join Lips rounded, jaw slightly down
7 ss s, z sit, zoom Teeth almost together, lips wide
8 nn n, l lot, not Tongue presses ridge, lips apart
9 rr r red Lips slightly rounded, cheeks firm
10 aa ah, aw fast, father Jaw open, oval mouth
11 e eh men Mouth wider, mid‑open
12 i ih, ee tip, tea Lips stretched, jaw higher
13 o oh toe Lips rounded, jaw mid‑open
14 u  ou boot Lips rounded, slightly forward

Oculus Docs use the longer spellings ih/oh/ou; VRChat's parameter list trims them to i/o/u.

Wiring Avatar

  1. Add blend‑shapes (shape keys) or a jaw bone. Each shapekey should be named exactly like the codes above (case‑sensitive in Unity). Keep the sil key—even if it only moves one vertex—to stop Unity deleting it on import.
  2. Set the Avatar Descriptor’s › Lip‑Sync mode to “Viseme Blend Shape”. Hit Auto Detect! first; if the SDK guesses wrong, pick the correct shapekey for every slot in the dropdown.
  3. Test in‑editor: play the scene, enable the Lip Sync preview on the descriptor, or simply talk into your mic while the Game view is active.
  4. Fine‑tune in your FX Animator (optional). The int parameter Viseme updates every frame; you can:
    • Drive a 15‑way BlendTree that weights each shapekey.
    • Branch to custom mouth animations (e.g., a “big scream” version of aa) when volume is high by also reading the Voice float (0‑1).
  5. Performance tip: keep viseme blend‑shapes on a separate head mesh; the GPU only has to update the vertices it actually changes.

Troubleshooting quick‑hits

Symptom Likely cause Fix
Mouth never moves Lip‑Sync mode still on Jaw Flap or Default Switch to Viseme Blend Shape
Wrong shapes (e.g., “th” looks like “ff”) Mis‑matched dropdown slots Re‑assign each viseme in the Descriptor
Shapes snap instead of blend Using an Int blend‑tree with thresholds too close Use a 1D float tree driven by 0‑14 or add smoothing
Shapes missing on Quest Head mesh not included in the Android build In Build tab, mark head mesh for both PC & Android

Take‑away

Visemes are simply numbered mouth cues. Name 15 blend‑shapes (or equivalent bone poses) to match the Oculus set, point your Avatar Descriptor at them, and VRChat’s built‑in Viseme parameter will make your character lip‑sync automatically—no extra scripts required.

Resources

See also