Sphinx-UE4 is a speech recognition plugin for Unreal Engine 4. The plugin makes use of the Pocket-sphinx library. At the moment, this plugin should be used to detect phrases. (eg. "open browser"). Singular word recognition is poor. I am looking at ways to improve this to a passable level.
Blueprint Only projects will not package properly with Plugins. This is a known issue of Unreal Engine for the time being. To work around this, simply add an empty C++ class to your Blueprint Only project.'''
When packaging a project. Ensure the model folder is included in 'Additional non-asset directories to Copy' (in Project Settings\\Packaging_.
Speech can control a character, moving them around a soccerfield, walking/running, turning and kicking the ball.
The game timer is set at 3 minutes. A ball will spawn, of a random colour (either blue or red). A goal is obtained, by kicking the ball into the colour that matches the ball. When a goal is scored, a new ball is spawned.
When the game starts, the game will be set up in a keyword listening mode. start the game: Starts a game of soccer.
enable walk: The character starts walking
enable sprint: The character starts sprinting
turn left: Rotates the character 45 degrees to the left.
turn right: Rotates the character 45 degrees to the right.
kick the ball: If a ball is in range, the player kicks the ball.
one eighty: Rotates the character 180 degrees.
stop movement: Stops all character movement
This example shows the grammar file support. The grammar file has the form. Upon a recognition that matches the form, it will show the operation and the result. For example "three add five", will look like this
If you click on the character on the map, you can see that Language is selectable property. There's support for English, Chinese, French, Spanish, Russian. Take a look at the blueprint to see what words are added for what languages. I can only speak English, so the testing of foreign languages is probably pretty blotchy.
An approximate volume of the microphone can be obtained in blueprints. In this example, the cone radius of the light is affected by the volume of the microphone.
- Download code from GitHub
Copy Plugins and Content folders into the project of your choosing.
From the Binaries folder, copy the appropriate .dll into Plugins\\SpeechRecognition\\Binaries\\Win64
Download and extract the following archive into the path "Content/model" within your project: Language Models
Right-click on the .uproject file, and select regenerate solution.
Open the Visual Studio project, recompile, and open the project in UE.
Open the project, and enable the Speech Recognition plugin.
Open the blueprint of whichever actor/class you wish to improve, by adding Speech Recognition Functionality.
I will now run through the changes necessary:
- When the Begin Game event is fired, create a Speech Recognition actor, and save a reference to this actor. After this, create and bind a method to OnWordSpoken. This method is triggered each time a recognized phrase is spoken. Lastly, ensure Shutdown (on the speech recognition actor) is called during End Play.
- Once the actor has been created, we will Initialise, and set configuration parameters for Sphinx. There is a huge range of sphinx params that can be configured. NOTE: Setting the recognition mode (keyword/grammar) will reset the Sphinx params that were previously added.
Ensure sphinx config params are set, before each change of the recognition mode.
Although this list is old, the following provides a detailed list of the various sphinx params.
At the moment, I set the following and would suggest trying the same. I am still experimenting to try and find what works best for me.
- The WordSpoken method takes in an array of recognized phrases. This set is looped over, to trigger in-game logic.
- Make sure on the End Play event, that the shutdown method is called. Otherwise, crashes will occur if multiple instances start up.
Just as there is an event fired when speech is detected, there are hooks for other events. Here is a list of all of the events
OnWordsSpoken: triggered when silence is broken, and one or more recognized phrases are detected.
OnUnknownPhrase: triggered when silence is broken, and no recognized phrases are detected.
OnStartedSpeaking: triggered when silence is broken, and speech is detected.
OnStoppedSpeaking: triggered when speech is broken, by silence.
At this time, we set the recognition mode to Keyword, and a set of Key phrases are passed in.
These are used to determine which phrases are spoken by the player. A Recognition Phrase comprises a string (representing the phrase we wish to detect) and a tolerance setting. This tolerance determines how easily a phrase will trigger. Play around with the tolerance settings, to test the balance between sensitivity, and false positives.
If your phrase features words that are not in the dictionary, they will not be detected. To add words to the dictionary, open the .dict file that matches the language of your choosing (eg. English is "Content\\model\\en\\en.dict").
This contains a list of recognized words. The first string is the recognized word. The rest is the phonetics of how the word is recognized.
Here are some examples:
abbott AE B AH T
ball B AO L
bandit B AE N D AH T
Simply add a word in a similar manner, and re-save the file.
Create C++ only examples which showcase the plugin:
Currently, I have only included a Blueprint example. I wish to write some C++ examples, showing how the plugin can run in a C++ class, instead of a Blueprint.
Adding additional languages:
Currently, there exists a number of sphinx-trained language models for languages other than English. If the language is supported by Unreal Engine 4, and there exists a trained model, then I will add it.
At the moment, my testing has been anecdotal testing. I wish to work on improving accuracy. Either by tweaking of the parameters passed into Sphinx or by the tweaking of the keyword tolerance values for the keyword Tolerance enumeration.