Generating realistic speech on-the-fly with Azure Functions and Cognitive Service’s Speech APIs

Last week Chloe Condon, a Cloud Developer Advocate for Microsoft, posted a great article and accompanying open source project for helping people handle awkward social situations. The project – combining Azure Functions, Twilio and a Flic button (available from Shortcut Labs), allows a user to trigger a fake call using a discrete Bluetooth button, which triggers an Azure function, which in turn uses the Twilio API to make a call to a specified number and play a pre-recorded MP3. You can read much more detail about the project in the great article Chloe wrote about it over on Medium.

As a side note, Chloe describes herself as an ambivert, a term which I will admit I had never come across, but after reading the description fits me to a tee. As with Chloe, people assume I am an extrovert, but whilst I am totally comfortable presenting to a room of conference goers and interacting with folks before and after, I soon find myself needing to recharge my batteries and thinking of any excuse to extricate myself from the situation – even just for a short time. Hence, this project resonated with me (as well half of Twitter it seems!).

One of the things that struck me when first looking at the app was the fact that a pre-recorded MP3 was needed. Now, this obviously means that you can have some great fun with this, potentially playing your favorite artist down the phone, but wouldn’t it be good if you could generate natural sounding speech dynamically at the point at which you made the call? Step in the Speech service from the Microsoft Cognitive Services suite – this is what I am going to show you how to do as part of this post.

The Speech service has, over the last year or so, gone through some dramatic improvements, with one of the most incredible, from my perspective, being neural voices. This is ability to have speech generated that is almost indistinguishable from a real human voice. You can read the blog post where neural voices were announced here.

So, based on all of this, what I wanted to achieve was the ability to trigger an Azure function – passing the text to be turned into speech – and have that generate an MP3 file for me and have that available to use immediately.

This is what I am going to show you how to do in this article and below you can hear an example of speech generated using the new neural capabilities of the service.

Let’s get started….

Read More

What’s next for the Microsoft Bot Framework?

In the last couple of months the Microsoft Azure Bot Service went GA (Generally Available), which was great news for all of the developers out there who have been using the platform and the associated Bot Builder SDK to build bots that can be surfaced across multiple channels, like Facebook, web chat, Skype, Slack and many more.  Production bots right now, hosted on the Azure Bot Service, use v3 of the SDK and it provides a solid platform for developing all sorts of types of chat bot scenarios.

Looking ahead, in the last couple of weeks, Microsoft has open sourced the next version, v4, of the SDK which is now under active development on GitHub.  I applaud the Bot Framework team at Microsoft for taking this approach (now becoming more and more common at Microsoft) of developing the SDK in the open and accepting contributions and feedback from the community, helping to ensure the next version builds on the awesomeness of the last.

I should say at this point, the team are very clear that v4 of the SDK is under active development and is therefore in a heavy state of flux and as such should only be used for experimentation purposes right now.  However, this gives us a great opportunity to see the direction of travel for the platform and Microsoft have even shared some of the high level roadmap for what we should expect looking forward over the next few months (again though, this is all subject to change).

Highlights

Here are couple of highlights (keep reading for some roadmap details further on :))

  • Much closer parity between the available flavors of the SDK – The v3 SDK is available for both C# and Node.js, but there are some key differences right now between the development approaches and some of the features available within each. e.g. FormFlow within the C# SDK, but not within Node.js.  Moving forward it looks like the team are aiming for a close to parity as possible between the SDKs, which will be hugely beneficial for developers, especially those who may end up using both of them.
  • Python and Java are joining the party – To accompany the .NET and JavaScript SDKs, the team are actively working on Python and Java options as well, which is great news and will allow an even broader set of developers to explore the benefits of the platform.  Right now the GitHub pages for Python and Java are not live yet, but keep an eye out for those soon (see the roadmap details below).
  • New middleware capabilities – The current version of the v4 SDK contains a new middleware system, which allows you to create rich plugins for your bot, or more generic middleware that can be used in multiple bots.  Every activity that flows in or out of your bot flows through the middleware components and therefore this allows you to build pretty much anything that you need.  A couple of example of middleware that exist right now are implementations for the LUIS and QnAMaker Cognitive Services.

The current roadmap

Obviously, in such an early stage the roadmap is likely to change, but in the spirit of transparency the team have shared some of the milestones that they envisage over the coming weeks and months.  The below is based on the public information the team have shared on the v4 wiki.

  • M1 – February 2018 – Public GitHub repos for C# and JavaScript SDKs.
  • M2 – March 2018 – Further ground work and consolidation of the SDKs, plus the introduction of the Python and Java SDKs.
  • M3 – April 2018 – Potentially this is when the initial API freeze will happen plus work on the migration story from v3 to v4 and helpers for developers relating to this.
  • M4 – May 2018 – Refinements and stabilisation work and this is also when the team are aiming for a broad public preview for the v4 SDK.

Where can I find this stuff?

Right now the .NET and JavaScript v4 SDKs are available on GitHub over at the links below and each has a really helpful wiki showing how the SDKs work right now and these will be kept up to date over time.  So if you are interested, head on over and check out the progress so far.  I for one am really excited to see more of the great work from the team over the next few months!

.NET v4 SDK on GitHub

JavaScript SDK on GitHub

QnAMaker Sync Library v1 and QnAMaker Dialog v3

I am pleased to announce the release of an updated version of the QnAMaker Dialog, allowing you to hook up a Bot Framework Bot and QnAMaker easily, and a brand new open source project, QnAMaker Sync Library, allowing you to sync an external data source to QnAMaker in a snap!

So, let’s look at the two new releases in a little more detail;

QnAMaker Dialog v3

GitHub -> https://github.com/garypretty/botframework/tree/master/QnAMakerDialog
NuGet -> https://www.nuget.org/packages/QnAMakerDialog/

If you haven’t seen the QnAMaker Dialog before, it allows you to take the incoming message text from the bot, send it to your published QnA Maker service, get an answer and send it to the bot user as a reply automatically.  The default implementation is just a few lines of code, but you can also have a little more granular control over the responses from the dialog, such as providing different responses depending on the confidence score returned with the answer from the service.

In the new v3 release, a couple of really significant improvements have been made.

The dialog is now based on v3 of the QnAMaker API (previously it was v1), meaning that when you query your QnAMaker service with the dialog you can now get more than one answer back if multiple answers are found.  This means that for queries which return multiple answers with similar confidence scores, you can potentially offer your user’s a choice of which answer is the best fit for them.

Secondly, v3 of the QnAMaker Service supports the addition of metadata to the items in your knowledgebase and the ability to use this metadata to either filter or boost certain answers.  The metadata is just one or more key/value string pairs, so you can add whatever information you like. e.g. you might add a metadata item called ‘Category’ and set an appropriate value for each answer, which you can then filter on when querying the service to provide a more targeted experience for your users.  The new QnAMaker Dialog release now uses this metadata and allows you to specify metadata items for both filtering and boosting.

More details about the QnAMaker dialog, including code samples for the new features are available over on GitHub.

QnAMaker Sync Library

GitHub -> https://github.com/garypretty/qnamaker-sync
NuGet -> https://www.nuget.org/packages/QnAMakerSync/

When you create a QnAMaker service, you can populate your knowledgebase in a few different ways – manually, automatically extract FAQs from a web page, or upload a tab separated file. However, many of you will already have your FAQ data held somewhere else, such as on your web site in your CMS or maybe within a CRM system.  What happens when you update the information in your other system? You probably need to go and manually update the knowledgebase in your QnAMaker service too, which isn’t great.  Added to this is the fact that behind the scenes (as mentioned above in the QnAMaker Dialog section), the QnAMaker service supports adding metadata to your QnA data to help you filter or boost certain answers when querying the service. The big problem right now though is that the QnAMaker portal doesn’t yet support the latest APIs and therefore you can’t add metadata through the UI.

So, what do you do?  Well, there are a set of APIs available for you to manage your knowledgebase, which includes metadata support, so you could go and write some code to integrate QnAMaker with your web site or repository – but there is no need now, because the QnAMaker Sync Library should hopefully have you covered!

The C# library allows you to simply write just the code needed to get your QnA items from wherever they are (e.g. FAQ pages on your site) and use them to build a list of QnAItems (a class included in the library).  Once you have this list, you then simply pass it to the QnAMaker Sync library (along with your knowledgebase and subscription ID) and voila, your data will be pushed into the QnAMaker service.  What’s more, when you build the list of QnAItems, you pass a unique reference for each item so that it can be identified in your original repository (e.g. a page ID from your web site) and these references are used the next time we sync so that we know which items to update and which to delete.

Full details as well as code samples are available over on GitHub and the library is now available via NuGet as well.

 

Microsoft Bot Framework – Store LUIS credentials in web.config instead of hardcoding in LuisDialog

Recently, I have been working on a release management strategy for bots built with the Bot Framework, using the tools we have in house at Mando where I work as a Technical Strategist.  As part of this work I have setup various environments as part of the development lifecycle for our solutions. i.e. local development, CI, QA, UAT, Production etc.  One of the issues I hit pretty quickly was the need to point the bot within each environment to it’s own LUIS model (if yo are not familiar with LUIS then check out my intro post here), as by default you decorate your LuisDialog with a LuisModel attribute as shown below, which means you need to hardcode your subscription key and model ID.

Obviously this need to hardcode isn’t ideal and I really needed to be able to store my Luis key and ID in my web.config so I could then transform the config file for each environment.

Thankfully this is pretty easy to achieve in Bot Framework using the in built dependency injection.  Below are the steps I took to do this and at the end I will summarise what is happening.

  1. Add keys to your web.config for your Luis subscription key and model Id.
  2. Amend your dialog that inherits from LuisDialog to accept a parameter of type ILuisService.  This can then be passed into the base LuisDialog class. ILuisService itself uses a class, LuisModelAttribute which will contain our key and Id, more on that in a minute.
  3. Next we create an AutoFac module, within which we register 3 types. Our Luis dialog, the ILuisService and the LuisModelAttribute.  When we register the LuisModelAttribute we retrieve our key and Id from our web.config.
  4. Then, in Global.asax.cs we register our new module.
  5. Finally, in MessagesController, this is how you can create your Luis Dialog.

That’s it.  After those few steps you are good to go.

So, let’s summarise what is happening here.  When you application loads the ILuisService and your Luis dialog are registered with AutoFac.  Also registered is a LuisModelAttribute, into which we have passed our key and id from our web.config.  Once that module has been registered, we can then get the instance of our dialog using scope.Resolve<IDialog<IMessageActivity>>().  This dialog takes an ILuisService as a parameter, but because we have registered that with AutoFac as well this passed in for us automatically. Finally the ILuisService needs a LuisModelAttribute, which, again, because we have registered this in our module is handled for us.

Once you have completed the above you can alter your Luis subscription key and model id by simply amending your web.config.

TechDays Online 2017 Bot Framework / Cognitive Services now available

This February saw the return of TechDays Online here in the UK, along with other sessions from across the pond in the U.S.  I co-presented 2 sessions on bot framework development along with Simon Michael from Microsoft and fellow MVP James Mann.  The sessions covered some great advice about bot development and dug a little deeper into subjects including FormFlow and the QnA Maker / LUIS cognitive services.

Both sessions are now available to watch online, along with tons of other great content from the rest of the 3 days.

Conversational UI using the Microsoft Bot Framework

Microsoft Bot Framework and Cognitive Services: Make your bot smarter!

Another fellow MVP, Robin Osborne, also recorded some short videos about his experience in building a real world bot for a leading brand, JustEat, so check them out over on his blog too.

Adding rich attachments to your QnAMaker bot responses

Recently I released a dialog, available via NuGet, called the QnAMaker dialog. This dialog allows you to integrate with the QnA Maker service from Microsoft, part of the Cognitive Services suite, which allows you to quickly build, train and publish a question and answer bot service based on FAQ URLs or structured lists of questions and answers.

Today I am releasing an update to this dialog which allows you to add rich attachments to your QnAMaker responses to be served up by your bot.  For example, you might want to provide the user with a useful video to go along with an FAQ answer. Read More

QnA Maker Dialog for Bot Framework

Update: The QnA Maker Dialog v3 is now available.  It adds support for v3 of the Microsoft QnA Maker API, including returning multiple answers and use of metadata to filter / boost answers that are returned.  You can read more about this and a new QnA Maker Sync library that is now also available on the announcement blog here.  Also, I have previously released an update to the QnAMakerDialog which supports adding rich media attachments to your Q&A responses.

The QnA Maker service from Microsoft, part of the Cognitive Services suite, allows you to quickly build, train and publish a question and answer bot service based on FAQ URLs or structured lists of questions and answers. Once published you can call a QnA Maker service using simple HTTP calls and integrate it with applications, including bots built on the Bot Framework.

Right now, out of the box, you will need to roll your own code / dialog within your bot to call the QnA Maker service. The new QnAMakerDialog which is now available via NuGet aims to make this integration even easier, by allowing you to integrate with the service in just a couple of minutes with virtually no code.

The QnAMakerDialog allows you to take the incoming message text from the bot, send it to your published QnA Maker service and send the answer sent back from the service to the bot user as a reply. You can add the new QnAMakerDialog to your project using the NuGet package manager console with the following command, or by searching for it using the NuGet Manager in Visual Studio.

Below is an example of a class inheriting from QnAMakerDialog and the minimal implementation.

When no matching answer is returned from the QnA service a default message, “Sorry, I cannot find an answer to your question.” is sent to the user. You can override the NoMatchHandler method to send a customised response.

For many people the default implementation will be enough, but you can also provide more granular responses for when the QnA Maker returns an answer, but is not confident in the answer (indicated using the score returned in the response between 0 and 100 with the higher the score indicating higher confidence). To do this you define a custom hanlder in your dialog and decorate it with a QnAMakerResponseHandler attribute, specifying the maximum score that the handler should respond to.

Below is an example with a customised method for when a match is not found and also a hanlder for when the QnA Maker service indicates a lower confidence in the match (using the score sent back in the QnA Maker service response). In this case the custom handler will respond to answers where the confidence score is below 50, with any obove 50 being hanlded in the default way. You can add as many custom handlers as you want and get as granular as you need.

Hopefully you will find the new QnAMakerDialog useful when building your bots and I would love to hear your feedback. The dialog is open source and available in my GitHub repo, along side the other additional dialog I have created for the Bot Framework, BestMatchDialog (also available on NuGet).

I will be publishing a walk through of creating a service with the QnA Maker in a separate post in the near future, but if you are having trouble with that, or indeed the QnAMakerDialog, in the mean time then please feel free to reach out.

Making Amazon Alexa smarter with Microsoft Cognitive Services

Recently those of us who work at Mando were lucky enough to receive an Amazon Echo Dot for us to start to play with and to see if we could innovate with them in any interesting ways and as I have been doing a lot of work recently with the Microsoft Bot Framework and the Microsoft Cognitive Services, this was something I was keen to do.  The Echo Dot, hardware that sits on top of the Alexa service is a very nice piece of kit for sure, but I quickly found some limitations once I started extending it with some skills of my own.  In this post I will talk about my experience so far and how you might be able to use Microsoft services to make up for some of the current Alexa shortcomings. Read More