nodejs

3 min read

** This API is still in heavy development, so this more than likely will not work. I am making this entry in order to take a running list of notes **

I was tasked with implementing Speech Recognition API into a project recently. As usual I found that the best place to gather documentation from this was on MDN's actual website.

There are several great examples of the API on GitHub. I was able to utilize this to start looking into how to properly implement this. The first thing you will want to understand is that Speech Recognition is actually quite restricted by browsers...as with usage of cameras and microphones, you will have to place permissions in your webmanifest.json.

The actual API for SpeechRecognition is actually quite expansive; however, to make a simple functioning webapp there really isn't a lot to put into it. In my case, I am working on an implementation into IceDev's IoT network......pagenodes.

Getting Started

__

There are a couple actual variables that will want to be grabbed from the browser before beginning. The recommended setup from the howtos follows this:

var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition
var SpeechRecognitionEvent = SpeechRecognitionEvent || webkitSpeechRecognitionEvent

Since this is a new API it may behoove you to create some kind of browser check in order to let users with "less than adequate" browsers to use Chrome. The key here is to go ahead, and just make your own new SpeechRecognition object:

var recognition = new SpeechRecognition();

By creating this object we now have access to the API which you can find right here.

One of the cool things about SpeechRecognition, is that it can be used to place verbally used content into a function, and then utilized in whatever way the user needs it to be.

The base functionality of Speech Recognition is to apply a DOM event and then begin the recognition:

document.body.onclick = function() {
  recognition.start();
  console.log('Now listening for speech');
}

At this point you have effectively set up an event listener on the entire page when you click on the body. You can then use the recognition.onresult to listen for speech incoming to the microphone. Utilizing the example from MDN they change the background using the color given:

recognition.onresult = function(event) {
  // show us what the event is
  console.log(event); 
}

This handy piece of code will change the background color of the page for you.

Several times before this when working with "still in production" browser libraries we have run into issues where they require user interaction in order to be started. SpeechRecognition actually has a nice work around that allows you to emulate an always on environment:

recognition.onresult = function(event) {
    ...
  recognition.start();
}

By running this we can have an always on sort of interaction(ware the battery life) on a website listening for input from the user. The implications of this are pretty striking.

Usage of .onerror is very important when working with speech recognition, because it is an event-based system, restarting it is going to be crucial until there is a better continuous recognition system.

recognition.onerror = function(err){
  console.log(err);
  // restart the recognition event.
  recognition.start();
}

...Granted, a lot of what we are doing here has been available on google for a long time, having the ability to make a voice cognizant webpage could have great future implications for the visually disabled user.

All in all, my first dive into this API has been quite interesting. It's going to be a challenge to get all of the browsers on board with this technology moving forward when you see Chrome moving more heavily towards Java centric built applications for ideas such as this.

I threw together a pretty fun little example of what you ......maybe could do over here at https://voicerecognition.surge.sh.

Sam Clark

Sam Clark

Read more posts by this author.