How to create a custom AI voice, and how voices work in general

If you want to make a good custom voice, it’s important to know what impacts the sound of your voice to begin with. That way, you’ll have a better idea of how to create a good prompt for your custom voice. 

Voices are incredibly complex and the unique way that they sound is largely determined by two main factors: pitch and resonance (and placement, to a degree). Pitch is how high or low your voice is, resonance is how the sound of your voice is impacted by the structure and size of your throat, mouth, and nose, and placement is where your resonance is perceived to be coming from, like your nose, chest, etc. The parts of your body that shape your voice include your vocal cords, mouth, throat, and nasal cavity. Your body is almost like a musical instrument and the voice is the sound produced. 

What sets pitch?

Pitch is created by the vibration of your vocal cords, which sit inside your larynx (or voice box). When your cords are stretched tight, they vibrate faster and make a higher pitch. When they are looser and thicker, they vibrate slower and create a lower pitch. Think of a guitar string. A really tight guitar string creates a higher note, while a really loose guitar string creates a lower one. 

The pressure of your breath also plays a role here. Stronger airflow can push the vocal cords to vibrate faster, which raises pitch. Hydration also matters here. Well-hydrated vocal cords vibrate smoothly, while dry cords make it more difficult to hit notes cleanly. Of course, if you’re not a singer, this may be less of a concern for you. 

What shapes resonance?

Resonance is a little harder to explain than pitch. We have to think about how your voice actually gets from your vocal cords to someone’s ear. It needs to travel through your throat, mouth, and nasal cavity. As the sound waves of your voice make that journey, the size and shape of those spaces impact how your voice sounds. For example, opening your mouth wide or lowering your tongue can give your dark a darker, rounder quality. You can try it yourself! (Just make sure you’re not in a quiet office. 

Your nasal passages and soft palate also matter. When your soft palate closes off your nose, your sound stays in your mouth and throat, giving you a fuller tone. When air flows through your nose, it adds a nasal quality. Singers and speakers often learn to adjust these spaces to create a brighter or richer voice. As you’ll learn, these factors can be hugely important in how your voice is perceived by an audience. 

What’s the difference between pitch and resonance? Pitch vs. resonance?

Pitch is how high or low your voice is. Resonance is how the sound of your voice is impacted by the structures it moves through. You can think of pitch as the musical note and resonance as the speaker system that shapes how that note actually sounds. 

FactorPitchResonance
What it isHow high or low your voice soundsThe color and richness of your voice
Main driverVocal cord length and tensionShape and size of your throat, mouth, and nasal cavity
Everyday exampleA child’s high-pitched voice vs. an adult’s deeper oneA nasal-sounding voice vs. a full, round one

What is placement?

Placement is where your voice seems to be coming from when you speak. Some people sound like their voice comes more from their nose while others sound like it comes from deep in their chest (yes, this is why some people sound more nasally than others). That’s where the resonance of their voice is largely located. 

However, with practice, you can shift the placement of your voice. Imagine the sound moving forward towards your lips or back in your throat. This has a big impact on how your voice actually sounds. A voice that’s more forward can sound brighter and clearer, while a voice that’s lower and further back can sound warmer or heavier. 

How to write a good prompt for our custom AI voice feature

When you create a custom AI voice in Visla, you have to guide our AI with a prompt. But what makes a good prompt? There are a lot of factors that go into writing a good prompt, which is why we’re going to go over them right here. 

Masculine vs. feminine

Though there are no hard and fast rules here, people tend to hear voices as more feminine or masculine based on pitch, resonance, and placement. If you want our AI to lean in one direction, you can add terms like “masculine” or “feminine” or “man” or “woman” to your prompt. This helps give the AI a direction to go in. 

Pitch

Pitch is one of the simplest factors you can control with your prompt. While, yes, pitch is an important factor for what makes a voice sound more masculine or feminine, you also have control over it specifically. A higher pitch can feel brighter and lighter, while a lower pitch can sound richer and more grounded. 

Other voice characteristics

Beyond the fundamentals, you can also add different vocal qualities to your prompt. Some qualities include “gravelly,” “smooth,” “melodic,” “warm,” or “bright,” but your imagination is really the limit here. While our AI won’t come up with a radically different voice based on every single specific, esoteric quality you add here, it does matter. 

Vibe

Don’t forget the vibe. Yes, the vibe – it’s really the best way to describe it. Do you want your custom voice to sound more energetic? Calm? Serious? Playful? Something else? These cues about the “vibe” can make a bigger difference than you might expect. Asking for a “serious” voice, of course, results in a vastly different voice than asking for an “energetic” one. 

FactorWhat it meansExamples you can use in a prompt
Masculine vs. FeminineGeneral direction based on pitch and resonance“masculine,” “feminine,” or blended terms
PitchHow high or low the voice sounds“deep,” “high-pitched,” “mid-range”
Other characteristicsTextures that shape the tone“smooth,” “gravelly,” “warm,” “melodic,” “bright”
VibeThe personality or energy of the voice“energetic,” “calm,” “serious,” “playful”

How to know what type of voice to create

Before you dive into prompts, take a step back. The best voice is not just about what sounds good to you, it’s about what will connect with your audience and support the goals of your campaign or your business. A voice that works in a training video may not fit in a high-energy sales clip.

Know your audience and your goals

Think about who will hear this voice. What tone will feel trustworthy, engaging, or inspiring for them? Match the style of the voice to the purpose of your video, whether that’s teaching, persuading, or entertaining.

How voices are perceived

People pick up on subtle differences in voices, and these impressions shape how they feel about your message. Research shows that pitch, variation, and tone can all influence how trustworthy, competent, or charismatic someone sounds. Here’s a quick guide:

Voice FactorHow it affects perceptionFurther insights
PitchLower pitch often feels more competent and trustworthy; higher pitch can increase warmth in women’s voicesStudy of voice perception in blind participants
Melodic vs. flat (pitch variability)Wider pitch range sounds more engaging and charismaticStudy on perceived speaker charisma
Study on charismatic speech
Pace (speech rate)Faster speech can boost persuasion and competence; very slow speech can hurt impressionsStudy on gaze and speech rate
Study on perceived trustworthiness
Gravelly/creaky (vocal fry) vs. smoothVocal fry lowers ratings of competence and trust, especially for women; smooth voices sound more attractiveImpact of vocal fry on young women in the labor market
Study on vocal attractiveness
BreathinessOften makes a voice feel more feminine and can shape gender perceptionStudy on perception of voice breathiness
Context & culturePreferences shift with culture and situation; no one-size-fits-allVoice perception has changed across time and cultures

Now that you’re armed with knowledge, you can create your first custom AI voice with confidence!

FAQ

What is a custom AI voice?

A custom AI voice is a synthetic voice designed to match your brand’s personality (like its tone, rhythm, accent, and delivery style) so your videos sound distinctively like “you.” With Visla, you can either describe the voice you want or replicate your own. Once created, the voice can be reused across projects, ensuring consistent and authentic narration. This moves beyond generic text-to-speech and gives you full control over how your brand sounds.

How realistic are these voices?

Modern AI voice systems are so advanced that many listeners cannot reliably distinguish them from real human speech. In fact, controlled studies often show that detection rates are close to chance. While brain scans reveal subtle differences in how people process synthetic and natural voices, most users find AI voices nearly indistinguishable in practice.

Why does my voice sound different when I hear it in a recording?

When you speak, you hear your voice in two ways: through the air and through vibrations in your bones. A recording only captures the air sound, which makes it feel unfamiliar. That’s why most people think they sound higher pitched in recordings. It’s not that the recording is wrong, it’s just missing the vibrations you normally feel inside your head.

Can I change the way my voice sounds without training?

Yes, even small adjustments make a difference. For example, good hydration helps your voice sound clearer and smoother. You can also practice speaking with more forward placement to brighten your tone. While training helps, little everyday habits add up quickly.

Can technology fully replicate the uniqueness of a human voice?

AI can capture pitch, resonance, and tone extremely well, but it still lacks some of the subtle emotional nuance of natural speech. That’s why adding descriptive cues like “warm” or “playful” in your prompt is important. These give the AI extra guidance to produce a voice that feels closer to a human speaker.

Why does voice tone matter more than words in some situations?

Studies show listeners often judge intent and credibility more from tone than actual word choice. A calm, steady voice can make even difficult information easier to accept. Meanwhile, a rushed or monotone delivery can undercut an otherwise solid message.



Join our thousands of subscribers.

Subscribe to our weekly newsletters for curated blog posts and exclusive feature highlights. Stay informed with the latest updates to supercharge your video production process.