Voice Chapter 8 – Help within the dwelling at this time

As you will have in all probability already learn, we launched our Residence Assistant Voice Preview Version at this time. The end result of the previous a number of years of open-source software program progress on Residence Assistant’s home-grown voice assistant, Help. A large group of devoted builders has been working collectively on including and honing its many options, and if it’s been some time because you tried Help, you must use this launch as an opportunity to leap again in and see the progress we’ve made.

Residence Assistant Voice Preview Version has been launched to construct on this work, persevering with the momentum we’ve already constructed and accelerating our aim of not solely matching the capabilities of current voice assistants however surpassing them. We had an early manufacturing run of Voice Preview Version (a preview preview 😉), and we tried to get them within the arms of as lots of our language leaders and voice builders as potential – and we’re already seeing the fruits of their efforts with language help enhancing over the previous month alone!

I’d like to spotlight on this voice chapter all of the issues you are able to do with Help at this time. I additionally need to give the state of our improvement, what the restrictions are, and the place your help may be greatest utilized.

Desk of Contents

Help within the dwelling at this time

Origins of Assist

Early Assist being used in chatEarly versions of Assist via chat – things have come a long way

Voice control for Home Assistant goes back further than most people assume, with some of the groundwork we use today being added as far back as 2017. The major turning point came when we refocused our efforts and declared 2023 the Year of the Voice. This was an effort to focus development and find areas where our community could make the most impact. During the Year of the Voice Assist was added to voice, intents were improved, languages added, wake words were created, and we established great local and cloud options for running voice. Shortly after Year of the Voice many more features were added, including integrated AI, timers, and even better wake words. Year of the Voice got the ball rolling, and Voice Preview Edition will continue its momentum.

Commands

Assist is the underlying technology that allows Home Assistant to turn commands (“turn on the light”) into Actions (light.turn_on). Commands, or as we call them intents, allow you to control pretty much every aspect of your smart home, including on, off, play, pause, next, open, close, and more. We also have intents that give you helpful information like what’s the time, weather, temperature, and so on. Lastly, there are a bunch of other useful miscellaneous things, like adding items to a shopping list and setting timers. If you’re interested, there is a full list here.

Timers

After we asked our community timers were a top-requested ability. You can not only set a timer, pause, increase, decrease or cancel it, but you can also set commands to trigger after a set amount of time, for example, “turn off the TV in 15 minutes”. You can also just say “Stop” without a wake word, to silence the timer’s alarm. On our Voice Preview Edition, when you set a timer the LED ring counts down the last seconds and flashes when it’s done.

Exposing devices and Aliases

This sets us apart from other voice assistants: we allow you to expose and effectively hide devices from your voice assistant. For example, you could choose not to expose a door lock but instead just expose the sensor that knows if the door is closed. It puts you in the driver’s seat on what voice can do in your home. We also introduced aliases to allow you to give devices multiple names, allowing you to speak more naturally with Assist.

Room context

If you tell your Assist hardware what room it is in and ensure other devices are organized by room, you can give commands like “turn off the lights”, and without specifying anything, it will turn off the lights in the room you are in. This feature also works with media players (play/pause/next) and timers.

Wake words

Timer animation video
Our community is donating small amounts of time to improve wake words
with our tool.

Wake Words are the unique phrases that initiate a voice assistant to listen and start processing a command. Wake words originally had to be processed on Home Assistant via an add-on like openWakeWord, meaning the Assist hardware needed to continuously stream audio to Home Assistant. Shortly after Year of the Voice microWakeWord was released, which brought wake word processing on-device for faster responses. It is improving fast thanks to our community using our fast and easy tool to donate samples of their voice. There is a growing list of wake words, and the on-device options include “Okay Nabu” (default and most reliable), “Hey Jarvis”, and “Hey Mycroft”. Both of these wake word engines were built by the Home Assistant community and are open source, giving the world two great free and open wake word engines!

Speech Processing

Timer animation videoThe Assist pipeline in all its glory

Assist can’t understand spoken words and needs something to take that audio and turn it into text – all this together is called an Assist pipeline. This speech processing is really CPU intensive, so it can’t happen on the Voice Assistant Hardware, and sometimes your Home Assistant system can’t even handle it. One important step we made was adding speech-to-text and text-to-speech capabilities to Home Assistant Cloud, which allows low-powered Home Assistant hardware to offload speech processing to the cloud. Home Assistant Cloud doesn’t store or use this data to train on – clouds don’t get any more private than ours. It is also the most accurate and power-efficient way to process speech. We’ve put considerable effort into local speech processing, building the add-ons and a new protocol they use to speak to Home Assistant, but they are very reliant on language support from the community.

Language support

Our language checkerSee if your language is supported with our checker.

Assist aims to support more languages than other voice assistants, and this has been a massive undertaking for our community – We need more help. The first step for language support is getting the commands (intents) right, and we have over 25 major languages which are prepared to make use of at this time. Our wake phrases are additionally getting higher at understanding totally different accents due to our Wake Word Collective tool.

Textual content-to-speech

We constructed our personal text-to-speech system, Piper, and it now supports over 30 languages. It’s a fast, local neural network-powered text-to-speech system that sounds great and can run on low-powered hardware (it’s optimized for Pi4!). It was built with the voices of our community, and if you don’t see your native tongue, add your voice!

Speech-to-text

There’s one space that holds again the remainder of our language help greater than others, and that’s native speech-to-text. Constructing a full speech-to-text mannequin wants large compute sources and terabytes of samples, which is at present exterior our attain. We use Whisper for local speech-to-text processing, an open-source project from OpenAI, and we’re grateful it exists. For some languages, it works great and doesn’t require a lot of system resources to run well, but for others, you need a pretty beefy system to get acceptable results. In our opinion, only about 15 languages are ready to be run locally on reasonable hardware (an Intel N100 or better) – that’s why before you begin dreaming up your perfect all-local setup, we recommend checking language support.

We’re always looking for new solutions for low-powered hardware, and are now building another tool that uses much less complex sentence recognition. This could even run on a Raspberry Pi 4, but it would only be able to identify predefined sentences, so if you go off script you may need to call in an AI to help Assist understand your needs. Our language leaders are hard at work putting together the needed translations, but if you want to learn more visit Rhasspy Speech.

Typically, even when your language is supported, you’ll nearly at all times get higher outcomes from Residence Assistant Cloud. Use the free trial to see what works greatest for you. Additionally, you should utilize each, we all know somebody utilizing an automation to modify the Help pipeline to an all native setup when their web is down.

AI and Help

Our default local conversation agent mixed with AI is great for natural language and speed

Another aspect where we beat the competition hands down is the integration of AI into our voice assistant. You can choose from some of the biggest cloud AI providers like ChatGPT, Google Gemini, and Claude (paid accounts required). You can also run it locally via Ollama when you’ve got a contemporary graphics processor with sufficient VRAM, permitting you to construct probably the most succesful offline voice setup round.

Our intents (Help’s built-in sentences) are getting higher at understanding most instructions, however AI processes instructions in pure language, that means in case you get the machine’s title ever so barely off, it might probably nonetheless determine issues out. It additionally supplies the flexibility to ask exterior the built-in intents. For example, in case you inform it “It’s a bit chilly in right here”, it could elevate the temperature in your thermostat, nevertheless it might forgo any dwelling management and simply inform you to placed on a jacket – outcomes should not but constant. Extra helpful is its capability to take a number of sensors and supply context. For example, you may ask it for an air high quality report, and it might assessment the CO2 ranges and inform you to open a window it observes is shut. All that is experimental, and having an AI management your private home shouldn’t be for everybody, however what’s essential is that you’ve the selection.

Conclusion

So many new improvements and enhancements for Help have occurred previously couple of months, and this speaks to the facility of getting good {hardware} to construct our software program on. Voice Preview Version is the very best open voice {hardware} obtainable at this time, and even with it solely within the arms of a few hundred individuals at this time, it’s making a noticeable distinction. Whether or not that’s writing code, enhancing language help, making blueprints, and even simply reporting bugs. The momentum we’ll construct having this within the arms of hundreds will probably be game-changing – it’s why we’ve declared that the period of open voice assistants has arrived.

Within the feedback sections, we at all times have a few individuals saying, “however I don’t use voice, what about enhancing (this or that)”. The excellent news is that enhancing Help and Residence Assistant’s different options are already occurring in tandem (take a look at our roadmap for the entire image of our priorities). Ultimately, solely a fraction of our improvement goes in the direction of voice, and our finances is what Amazon’s voice group in all probability spends on pizza events 😆. A fantastic facet impact is the issues we’re fixing with voice are benefiting different elements of Residence Assistant, for instance, our integration of AI was pushed by voice.

We actually assume voice is an integral a part of a well-rounded good dwelling ecosystem. It’s particularly essential for enhancing the accessibility of dwelling management to all members of the family. There must be actual choices within the house, most significantly ones that offer you full management and an actual selection on privateness.

Residence Assistant Voice Preview is offered at retailers at this time,

Voice Preview Edition with packaging