Bits and Bytes--Understanding Digital Jargon for Your
Home Voice Over Recording Studio

aiff, wav, mp3, samples rates-what's it all about?

"Send me a 16 bit 44.1khertz .wav file, a 16 bit 44.1khertz .aiff file to my FTP site and an MP3 file at 160k in an email"

Uh, how's that again?

OK, you've performed your first voice over gig from you home voice over studio and it's time to deliver your performance to your client.  Back in the day you would mix it down to quarter inch tape and FedEx it for next day delivery.  But, whoa!, that was in the last century!  Today tape has gone the the way of M C Hammer pants--everything's digital. 

And you need to learn the lingo--and moreover the concepts represented by these words--to communicate effectively with your clients and deliver your work in the format that they require.  So you need to understand and speak digital.

Light switchWhat is Digital Recording?

OK, if you know this stuff you can skip this part, but if you're not sure, clear your mind and settle in 'cause I'm going to explain digital recording once again.  So what is digital recording?  Wait a minute... for that matter what is digital?  Let's start with that. 

Representing information in a digital format came out of the computer industry.  Computers store information with little switches just like the light switch on the wall.  Now that on-the-wall light switch only has two states--it can be ON or OFF.  These two states can be represented with just two digits-- a zero (0) for off and a one (1) for on

In everyday life we use a base ten number system which uses ten digits (0,1,2,3,4,5,6,7,8,9) to represent any number.  Ahhh, but what if you only have two digits?  Well, to represent numbers with two digits you need to use a different number system-- a base two number system.  This is the renowned binary number system. 

There have been thousands of articles describing just how binary number work but suffice it say that we can represent a (base ten) number like 1,362,478 with a binary number like 101001100101000101110. And that representation of the number can be stored in those little light switches--let's see, on, off, on, off, off, on, on... OK, you get the idea.

By the way, each of the ones or zeros is called a Binary Digit.  And since they're little tiny things--and because technical jargon often has some word humor in it-- this term got shortened to BitBinary DIgit = Bit.  These binary digits were often worked on in groups or "words" of eight bits.  So if you had one group you would have "one word by eight bits", six words would be "six words by eight bits."

Well, it didn't take long for those merry computer folks to shorten "by eight" to byte.  Get it? Like bite!  A bit and a byte.  (And four bits was called a nibble but you'll rarely hear that term anymore.)  Yeah, those computer labs are a virtual Comedy Store.

So How Do You Record Sound with Digital Numbers?

mountain-rangeSo now you have a number system that you can store in a computers memory, how do you convert sound into numbers? Let's start with a quick analogy.  Suppose we want to map the peaks and valleys of a mountain range. 

One way to do this would to be to walk along in a straight line and every few feet, take an altitude measurement.  Walk ten feet, measure: 5234 feet.  Walk ten feet, measure 5245 feet.  Walk ten feet, measure 5251 feet.  and so forth for the whole mountain range. 

This would give you a series of numbers, 5234, 5245, 5251--or in binary 1010001110010, 1010001111101, 1010010000011-- that you can save in your computer.  And by recalling these numbers you can redraw the peaks and valleys of the mountains. 

Now let's look at a sound wave.

sound waveHere's typical audio sound wave.  Wow!  That looks just like a mountain range.  Well, at least a mountain range reflected in a tranquil alpine lake.  So we can measure the sound wave in a similar way. 

Move along a wee bit from left to right (which is the TIME axis) and every few moments take a measurement of how high or low the waveform is (which represents how LOUD the sound wave is--amplitude for you science buffs). 

Once again you get a series of numbers--say 36345, 36218, 36154, etc. or in binary 1000110111111001, 1000110101111010, 1000110100111010 that you can store in your computer to represent the audio waveform.  And if you spit these numbers back out at the same rate you recorded them into a device called a digital-to-analog converter, PRESTO! you hear the same sound you recorded.  Ahh a digital recording!

Now you don't really need to know exactly how this all happens but understanding the general principles will help you make sense of all the digital recording terms that are tossed around.

Your First Terms: Sample Rate and Sample Format

Let's go back to our pleasant stroll along that mountain top and look a little closer at what we're doing.  First of all we rather arbitrarily decide to take an altitude measurement every ten feet.  But what if we took a measurement every five feet.  If we did that, the representation of the mountain would be more accurate, wouldn't it? 

And if we took the measurement every 20 feet the representation would be less accurate.  So if were walking at a steady pace the accuracy is determined by how often we take a measurement. 

The techo term for taking a measurement is called: TAKING A SAMPLE or SAMPLING.  And how often you measure is called the SAMPLE RATE.

Then there's the other piece of information we collected:  how high the mountain is. Remember we measured it in feet (sorry, rest of the world).  But what if we had measured it in inches?  Then the measurements would be twelve times more accurate. 

But if we measured it in yards then the measurements would be less accurate--one third as accurate.  So the accuracy of our height measurements is determined by which unit of measurement--which ruler-- we use.

The techno term for the number of units that you use to measure is called the SAMPLE FORMAT.

Sample Rates in Digital Audio

So let's apply this to digital audio.  We'll look at SAMPLE RATE first.  Obviously we don't walk along ten feet... and we "walk" a lot faster.  In digital audio we actully measure many times--several thousand times-- each second.  So the SAMPLE RATE in digital audio is expressed as how many thousands of sample there are each second--samples per second. For Example, CD quality digital sound has a sample rate of 44,100 samples per second. 

And since this stuff was all made up by scientists the number is usually expressed with a decimal and the term "kilo" which means thousands and the term "hertz" which means times per second.  So we would say the SAMPLING RATE of a CD is 44.1 kilohertz (44,100 times per second).

Sample Formats in Digital Audio

So how about the SAMPLE FORMAT?  Notice that when we measured the mountain altitude, the height of the mountain at a particular point didn't change-- but the number we used to represent it changed depending on which scale we used.  So it could be 1,748 yards or 5,245 feet or 62,940 inches. 

So we're actually dividing the overall height into smaller and smaller--and more and more--units.  And the more units we use, the more precise the measurement.  Digital data achieves this increase in accuracy by using more and more BITS to represent the waveform height.  Remember, a bit is like one light switch, and digital numbers require one bit per column:  1101 requires 4 BITS, 10011010 requires eight BITS, etc. 

Using more BITS lets you divide your measurement into more and smaller parts for more accuracy.  For you math freaks this relationship is computed by using the number of BITS used as the exponent of 2 ('cause it's binary, right?): 22, 23, 24,25 etc.  Here's the number of divisions you get for additional bits:

8 28 256
12 212 4096
16 216 65535
20 220 1048575
24 224 16777215


So you see when you use more bits, you get a lot more accuracy!  This additional accuracy translates into clearer sound with less noise (hiss).  For you audio purists, each BIT gives you an additional 6 dB (decibles) of theoretical signal to noise ratio. This is called the dynamic range. 

So 8 bits gives you a dynamic range of 48dB.  16 bits gives you a dynamic range of 96dB.  And 24 bits gives you a whopping 144dB dynamic range, which is wider than the range of human hearing.

So when they invented CDs they had to pick the number bits which determined the divisions to divide the waveform height into.  They figured that 96 decibles was pretty darn good so they settled on 16 BITS as the CD SAMPLE FORMAT.

AHA!  The CD Sample Format and Sample Rate!

When the numbers are thrown around they are usually expressed with the SAMPLE FORMAT first and the SAMPLE RATE second.

Soooooo... the digital sound data recorded on a CD is 16 BIT, 44.1 kilohertz data.  By the way, this is the most common data format and rate used to send files to clients, so if they don't have a preference, use this.  Another common format is 16 BIT, 48 kilohertz.  This is often used on video editing systems.

You can also save this audio data on your computer in exactly the same way.  This happens when you "rip" a CD track to your computer.

If you save it to a PC computer you will create a .wav (pronounced "wave") file.

If you save it to a Mac computer you will create an .aiff (pronounced "A-I-F-F"--sorry!) file.

PCM Recording

Trust me, you'll will never need to know this, but this type of recording is called PCM or PULSE CODE MODULATION recording.  If the sample format and sample rate are high enough it can accurately record sound beyond the range of human hearing. 

For example 24 bit, 96 kilohertz recording can record a dynamic range of 144 decibles and a sound spectrum from 20-48000 hertz.  They came up with this method of recording to accurately record the best possilble sound without any of the hiss, rumble, wow and flutter that was inherent in mechanical recording on tape and LPs.

But Wait... There's More!

And as an added bonus... When you send a digital audio file to someone you're NOT sending them sound, your sending them the numbers that represent the sound.  So you can copy the digital data, and then copy that copy, and then copy that copy of the copy, etc. a thousand times and each copy is an EXACT REPLICA of the original! 

And you can do all those cool things you do with digital files: store it on your hard drive, upload it to a website, download it, save it on a flash drive, etc.

So for fancy projects that want the best audio, .wav files and .aiff files are the way to go!

There's Only One Problem...

These files are BIG... I mean REALLY BIG! One minute of 16 bit 44.1 khertz mono is 5 megabytes! A thirty minute narration track is 150 megbytes.  And if you have multiple takes!  It's not unusual for the files in an audio project to be 500 megabytes to several GIGAbytes!

Here's One Time When You Have To Sound Like You Know What You're Talking About!

When the client says he wants .wav (or .aiff) files EMAILED to him, you need to correct him.  Quote: "wav files will be too big to email!".  Unless it's only one take of a few seconds, the files will swamp his email if you send it and it will likely be rejected by his mail server.  Fact of life.  Don't let them argue the matter.  Don't email .wav or .aiff files.

So What's a Budding Voice Over Artist To Do?

Don't panic!  There are two possible solutions to the problem.  Email smaller compressed files--we're talking MP3s.  Or upload the uncompressed wav and aiff files to a web server via FTP for the client to download.


by William Williams

Bits and Bytes Digital Recording Jargon Page 2 >>


 Not from Los Angeles?
Try Our Online Voice Over Classes!

If you would like to discuss
building your voice-over career
give us a call at:  

*** return to voice over classes***

Aliso Creek Productions
4106 W. Burbank Bl. * P.O. Box 10006
Burbank, CA 91510 • 818-954-9931
© 2015 Aliso Creek Productions * All Rights Reserved

Home | About | Voice Over Classes | Online Voice Over Classes | Aliso Creek Pictures | Animation | Aliso Creek Network | New Media | Web Design | Voice Over Blog | Contact Us