If you want to make a good custom voice, it’s important to know what impacts the sound of your voice to begin with. That way, you’ll have a better idea of how to create a good prompt for your custom voice.
Voices are incredibly complex and the unique way that they sound is largely determined by two main factors: pitch and resonance (and placement, to a degree). Pitch is how high or low your voice is, resonance is how the sound of your voice is impacted by the structure and size of your throat, mouth, and nose, and placement is where your resonance is perceived to be coming from, like your nose, chest, etc. The parts of your body that shape your voice include your vocal cords, mouth, throat, and nasal cavity. Your body is almost like a musical instrument and the voice is the sound produced.
What sets pitch?

Pitch is created by the vibration of your vocal cords, which sit inside your larynx (or voice box). When your cords are stretched tight, they vibrate faster and make a higher pitch. When they are looser and thicker, they vibrate slower and create a lower pitch. Think of a guitar string. A really tight guitar string creates a higher note, while a really loose guitar string creates a lower one.
The pressure of your breath also plays a role here. Stronger airflow can push the vocal cords to vibrate faster, which raises pitch. Hydration also matters here. Well-hydrated vocal cords vibrate smoothly, while dry cords make it more difficult to hit notes cleanly. Of course, if you’re not a singer, this may be less of a concern for you.
What shapes resonance?

Resonance is a little harder to explain than pitch. We have to think about how your voice actually gets from your vocal cords to someone’s ear. It needs to travel through your throat, mouth, and nasal cavity. As the sound waves of your voice make that journey, the size and shape of those spaces impact how your voice sounds. For example, opening your mouth wide or lowering your tongue can give your dark a darker, rounder quality. You can try it yourself! (Just make sure you’re not in a quiet office.
Your nasal passages and soft palate also matter. When your soft palate closes off your nose, your sound stays in your mouth and throat, giving you a fuller tone. When air flows through your nose, it adds a nasal quality. Singers and speakers often learn to adjust these spaces to create a brighter or richer voice. As you’ll learn, these factors can be hugely important in how your voice is perceived by an audience.
What’s the difference between pitch and resonance? Pitch vs. resonance?
Pitch is how high or low your voice is. Resonance is how the sound of your voice is impacted by the structures it moves through. You can think of pitch as the musical note and resonance as the speaker system that shapes how that note actually sounds.
Factor | Pitch | Resonance |
---|---|---|
What it is | How high or low your voice sounds | The color and richness of your voice |
Main driver | Vocal cord length and tension | Shape and size of your throat, mouth, and nasal cavity |
Everyday example | A child’s high-pitched voice vs. an adult’s deeper one | A nasal-sounding voice vs. a full, round one |
What is placement?
Placement is where your voice seems to be coming from when you speak. Some people sound like their voice comes more from their nose while others sound like it comes from deep in their chest (yes, this is why some people sound more nasally than others). That’s where the resonance of their voice is largely located.
However, with practice, you can shift the placement of your voice. Imagine the sound moving forward towards your lips or back in your throat. This has a big impact on how your voice actually sounds. A voice that’s more forward can sound brighter and clearer, while a voice that’s lower and further back can sound warmer or heavier.
How to write a good prompt for our custom AI voice feature
When you create a custom AI voice in Visla, you have to guide our AI with a prompt. But what makes a good prompt? There are a lot of factors that go into writing a good prompt, which is why we’re going to go over them right here.

Masculine vs. feminine
Though there are no hard and fast rules here, people tend to hear voices as more feminine or masculine based on pitch, resonance, and placement. If you want our AI to lean in one direction, you can add terms like “masculine” or “feminine” or “man” or “woman” to your prompt. This helps give the AI a direction to go in.
Pitch
Pitch is one of the simplest factors you can control with your prompt. While, yes, pitch is an important factor for what makes a voice sound more masculine or feminine, you also have control over it specifically. A higher pitch can feel brighter and lighter, while a lower pitch can sound richer and more grounded.
Other voice characteristics
Beyond the fundamentals, you can also add different vocal qualities to your prompt. Some qualities include “gravelly,” “smooth,” “melodic,” “warm,” or “bright,” but your imagination is really the limit here. While our AI won’t come up with a radically different voice based on every single specific, esoteric quality you add here, it does matter.
Vibe
Don’t forget the vibe. Yes, the vibe – it’s really the best way to describe it. Do you want your custom voice to sound more energetic? Calm? Serious? Playful? Something else? These cues about the “vibe” can make a bigger difference than you might expect. Asking for a “serious” voice, of course, results in a vastly different voice than asking for an “energetic” one.
Factor | What it means | Examples you can use in a prompt |
---|---|---|
Masculine vs. Feminine | General direction based on pitch and resonance | “masculine,” “feminine,” or blended terms |
Pitch | How high or low the voice sounds | “deep,” “high-pitched,” “mid-range” |
Other characteristics | Textures that shape the tone | “smooth,” “gravelly,” “warm,” “melodic,” “bright” |
Vibe | The personality or energy of the voice | “energetic,” “calm,” “serious,” “playful” |
How to know what type of voice to create
Before you dive into prompts, take a step back. The best voice is not just about what sounds good to you, it’s about what will connect with your audience and support the goals of your campaign or your business. A voice that works in a training video may not fit in a high-energy sales clip.
Know your audience and your goals
Think about who will hear this voice. What tone will feel trustworthy, engaging, or inspiring for them? Match the style of the voice to the purpose of your video, whether that’s teaching, persuading, or entertaining.
How voices are perceived
People pick up on subtle differences in voices, and these impressions shape how they feel about your message. Research shows that pitch, variation, and tone can all influence how trustworthy, competent, or charismatic someone sounds. Here’s a quick guide:
Voice Factor | How it affects perception | Further insights |
---|---|---|
Pitch | Lower pitch often feels more competent and trustworthy; higher pitch can increase warmth in women’s voices | Study of voice perception in blind participants |
Melodic vs. flat (pitch variability) | Wider pitch range sounds more engaging and charismatic | Study on perceived speaker charisma Study on charismatic speech |
Pace (speech rate) | Faster speech can boost persuasion and competence; very slow speech can hurt impressions | Study on gaze and speech rate Study on perceived trustworthiness |
Gravelly/creaky (vocal fry) vs. smooth | Vocal fry lowers ratings of competence and trust, especially for women; smooth voices sound more attractive | Impact of vocal fry on young women in the labor market Study on vocal attractiveness |
Breathiness | Often makes a voice feel more feminine and can shape gender perception | Study on perception of voice breathiness |
Context & culture | Preferences shift with culture and situation; no one-size-fits-all | Voice perception has changed across time and cultures |
Now that you’re armed with knowledge, you can create your first custom AI voice with confidence!
FAQ
A custom AI voice is a synthetic voice designed to match your brand’s personality (like its tone, rhythm, accent, and delivery style) so your videos sound distinctively like “you.” With Visla, you can either describe the voice you want or replicate your own. Once created, the voice can be reused across projects, ensuring consistent and authentic narration. This moves beyond generic text-to-speech and gives you full control over how your brand sounds.
Modern AI voice systems are so advanced that many listeners cannot reliably distinguish them from real human speech. In fact, controlled studies often show that detection rates are close to chance. While brain scans reveal subtle differences in how people process synthetic and natural voices, most users find AI voices nearly indistinguishable in practice.
When you speak, you hear your voice in two ways: through the air and through vibrations in your bones. A recording only captures the air sound, which makes it feel unfamiliar. That’s why most people think they sound higher pitched in recordings. It’s not that the recording is wrong, it’s just missing the vibrations you normally feel inside your head.
Yes, even small adjustments make a difference. For example, good hydration helps your voice sound clearer and smoother. You can also practice speaking with more forward placement to brighten your tone. While training helps, little everyday habits add up quickly.
AI can capture pitch, resonance, and tone extremely well, but it still lacks some of the subtle emotional nuance of natural speech. That’s why adding descriptive cues like “warm” or “playful” in your prompt is important. These give the AI extra guidance to produce a voice that feels closer to a human speaker.
Studies show listeners often judge intent and credibility more from tone than actual word choice. A calm, steady voice can make even difficult information easier to accept. Meanwhile, a rushed or monotone delivery can undercut an otherwise solid message.