Other Posts in Speech

  1. Text to Speech in .Net using C#
  2. Speech Recognition in .Net using C#

Text to Speech in .Net using C#


This may seem like a departure from my normal set of posts but speech technology (recognition, etc.) is one subject that I've always found interesting. I think it's because it's a subject that I know next to nothing about. Anyway, I've decided to fix that small problem and see if I can teach myself something about it. It turns out though, since I'm working in .Net, that I really don't have to do much work. It turns out that Microsoft has been working on speech recognition for years now and has a set of classes built into .Net for us to use freely. That's why I like .Net, there's so much in the framework that there are complete sections that I've never taken a look at before...

Anyway, before I tackled the task of having the system recognize what I was saying (or if it could recognize my voice when compared to others, which would be cool as well), I figured I would start on the task of having it speak to me. More specifically if it could take a bit of text and convert it to speech. It turns out this is an incredibly simple task:

   1: SpeechSynthesizer Synthesizer=new SpeechSynthesizer();
   2: Synthesizer.SpeakAsync("This is a test");
   3: Synthesizer.Dispose();

That's it. Well sort of anyway. For any project that you want to do speech you have to add the System.Speech portion of .Net as a reference. The System.Speech set of classes are a managed wrapper for the SAPI (Speech API) 5.3 engine (or 5.1 if you're on XP SP2). Basically it takes away any of our pain points when dealing with the COM objects. And as long as you're using XP (with service pack 2), Vista, Windows 7, Windows Server 2008, or Windows Server 2003, you should be good to go as it should be on the system (well you also need .Net 3.0). But assuming those things, the code above will start up and say "This is a test" over your speakers (note that this is done asynchronously and there are other ways to call it that give you more control).

Now that's great and all but what about volume or the rate at which it's said? Simple, the SpeechSynthesizer class has two properties, one for volume (100 being max volume, 0 being silent) and another for rate of speech (0 being the base line, higher numbers are faster and lower numbers are slower in the range of -10 to 10). If we want we can even set where we want it to output the sound. For instance instead of being sent to the speakers, we could send it to a wav file.

   1: Synthesizer.SetOutputToWaveFile("WAVE FILE PATH");

That's it. With that, we can save our file for playback later on. All of that is great and we can even set the voice that is used... Well we could if Microsoft had given us more than just one voice.  Anyway, with this one class, we can do quite a bit. So give it a try, leave feedback, and happy coding.