What’s VRM? and related things

12/23/21 - Jessica Pixel/

Just a brain dump of some important concepts I think I understand enough to write down and share. Hopefully this will be helpful. If something is wrong, please let me know. If it’s helpful, you can also let me know.



VRM is a format for avatars that allows for interoperability by standardizing many important components. I’m not able to find a solid answer on what the V, R, and M stand for exactly, but virtual reality in probably in there.

Container format (computing) - Wikipedia

A single VRM file is a container, like FBX, ZIP, or MP4. It is one file that contains other individual files that take care of various parts of the avatar. parts like:

Permissions, License

License settings in VRM

VRM allows a standard way to record how the particular avatar can be used. This is a courtesy type feature at the moment, but it does allow software the ability to deny or allow use of specific VRM avatars depending on it’s imbedded license. Of course, if you can see it on your computer, you can have it, but deterring casual theft and misuse of your intellectual property is good. VRM avatars can be created with owner usage only perms, specific copyright perms, commercial OK/NG, or okay or not for 18+ or violent usage, and a few other things.

Materials, Shaders


A material is a combo of images and shaders applied to a face of a mesh. Shaders are code that step through each pixel of a collection of related textures and do the math needed to make that pixel (and the rest) in the image look how you need. Shaders can combine images, change transparency and/or color, how much light is reflected, how much something glows, move where an smaller texture layered on top of the main texture is located (ex: texture based eye movement). This is FAR from a list of things shaders can do. Even the shader MToon that goes along VRM has more features than I mentioned, and the VRM1.0 version of MToon has EVEN MORE features. VRM only supports the MToon shader, and the unlit and standard Unity shaders. A low number of allowed shaders allows for greater interoperability, making sure your character renders the same in all platforms.

Materials are assigned to a face of a mesh. *an ATLAS is all your textures put together into one texture in a sort of map (hence the name). the atlased mesh is edited (UVs are changed?? i don't exactly know) to know how to use the new single map texture instead of the multiple textures before. selecting the option in VRoid Studio to reduce the number of textures to 8 or 2 is atlas-ing the mesh. cats can atlas your textures/meshes but it seems to forget to atlas the NORMALS texture??*

*there are several images that shaders can use for their math and i don't know them all. this method is called PBR:*

Physically based rendering - Wikipedia


This is the mostly blue texture with green and red and colors in-between that looks like an inverted version of the regular texture it's associated with. The shader in the material will use this texture to figure out what direction light should be reflected when the texture is viewed from a specific direction. Shaders use these numbers to know which pixel of a matcap texture to use when figuring out how a pixel should look.

*Take this next part lightly. I don’t know the actual numbers or orders of variables, etc. This is just to hopefully help you understand what it does.*

The <r,g,b> color of each pixel in the normal is used as an <x,y,z> value in the math the shader is doing to figure out how light on the final texture should look. Normals are used to make something look “bumpy” without adding actual bumps to the mesh and increasing your triangle count by faking what happens with real light on real surfaces in the real world. You can really tell when a mesh is bumpy because of normals when viewing it at a shallow angle breaks the effect. There are more images like normals like SPECULAR that shaders use.


Similar to a normal, the color of each pixel represents a 0.0 to 1.0 float type of number the shader uses to figure out how strongly to reflect light. This image looks like a black and white version of the regular one, but only the shiny stuff is visible.


Items with no shape keys but can be rigged to bend with the skeleton (turn your head, bend your elbow) or attached to a particular bone (a hat attached to head bone, ring to finger bone). VRM requires triangulated mesh (at least I get errors in Blender trying to export something with quads). A mesh can have one or more faces and different faces can display different materials.

Skinned Mesh

Mesh with shape keys. When making avatars for VR, minimize the number of skinned meshes by combining them or removing them *or some other option i dont know*. If you're Vtubing, it's up to your computer or software I guess.


Armature (computer animation) - Wikipedia

The skeleton of the avatar made of bones. VRM has some requirements regarding what bones need to be included. Abstractly: there is a ‘list’ within the VRM (one of the files within the container) of which bones in the avatar’s armature correspond to standard bones, and standard bones allow software to easily manipulate those bones. If an avatar isn't shaped too differently from a humanoid, most motions should look fine on most VRM avatars. Any part of the avatar mesh that needs to move/bend (clothes and body when in any position other than T/A pose, hair when head moves, boob physics) will be rigged with ‘weight’ to the bones of the armature.

Rigging, Weights

Vertex painting - Wikipedia

This is assigning how each bone in an armature will effect the position of the vertices of the mesh around it. ‘Weight painting’ is a way to do this visually, with colors showing the average of the 0.0 to 1.0 float type numbers from each vertex representing how strong the effect moving that bone has on the that mesh. Selecting some mesh in Blender and assigning it to a vertex group is doing a similar thing. The slider below that list can be changed before assigning, but if you're assigning something this way it's generally something ‘solid’ (nails to finger tip bones, iris and highlight to eye bones, hat to head bone) that will move directly along with a bone without bending.

If meshes share a rig they will move together more fluidly. There can be clipping even if they are the same depending on the motion, etc. Adjusting some numbers by weight painting can fix clipping. *i can't say anything more on rigging and what i have said take with a grain of salt because i barely understand how to use blender.*


GitHub - vrm-c/UniVRM

Unity is generally last after meshing, texturing, and rigging your avatars in VRoid Studio/Blender/whatever else you know how to use. Those other softwares can also import and export VRM. UniVRM is the .unitypackage used to turn models into VRMs.


Keep an eye on their websites. VRM is about to enter the 1.0 beta phase (currently 0.9 and a lot of content is older). The VRM Consortium is kind of the people in charge of the direction of the format. *There’s probably stuff like this for other formats, but I don’t know much about it AND I could be wrong.* *I’m reading sites using Google Translate.* The representatives are from a handful of companies related to stuff like media, internet, video games, you get the point.

Physics, SpringBone


SpringBone is the VRM version of Dynamic Bones. It is code that does the math that makes a bone or chain of bones in an armature (and therefore the vertices rigged/assigned/weight painted to that bone) move based on real world physics math. Things like dampening, elasticity, inertia, gravity, ‘physical’ size... *that kind of stuff that I do not really understand, I just adjust numbers and wiggle until it looks right.* SpringBone also includes code for colliders; invisible spheres that can affect the physics math of the SpringBones, generally by pushing bones away. (ex: vertices of the bangs in hair mesh are assigned/weight painted to bones in the armature. SpringBone code is added to those hair bones and various values are adjusted so it moves appropriately. Collider code is added to the head bone creating an invisible sphere aligned with the avatar's forehead. The hair bone’s SpringBones and head bone Collider are told how to interact by the code, in this case keeping the hair bone from clipping into the avatar’s forehead by adjusting the math being done to move the hair bone to account for something in the way (*some kind of 3d multiplication that i do not understand*), that something being a combo of the size of the Collider and the physics settings of the SpringBones. Colliders can be used: in the hand bones to bap cat ears, in head and shoulders to keep long hair from clipping, in thigh bones that keep skirt bones from clipping through legs. There are a lot of options. There will be lots of settings to tweak. *I don’t have any good numbers for anything at this time.*

Inverse Kinematics

Inverse kinematics - Wikipedia

Where forward kinematics is asking a 3D model “translate/rotate your shoulder bone, then your elbow bone, then your wrist bone, then your hand bone” to reach a certain pose... the model already knows the length and width of its limbs, what directions those limbs can move, if its a ball or hinge joint (shoulder/hips vs elbows/knees), and several other things but you get the point. Because the model knows those numbers, you can ask it “translate/rotate the hand bone” and the math to figure out what translation/rotation needs applied to the connected bones involved can happen. When you’re posing a VRoid with the pose mode, the software is using IK math to figure out how to move the bones and joints connected to the hand you’re dragging across the screen, while trying to keep the joints from bending in a direction they shouldn't, keeping joints connected by moving more bones if a bone moves ‘too far’.