Serving up Lunch Recommendations on Alexa and the Google Assistant

Where should we go for lunch?

If you’ve ever worked in an office and aren’t one of those folks that brings their lunch each day with a religious fervor, you probably hear this question pretty often. Although we’ve got a few consistent favorites at the Southern Made office (looking at you, Hillwood Pub) – we still struggle to find new places and come up with enough options to find something everyone will like on any given day. So, as with most problems, we figure – why not let the robots do it?

Deciding on lunch is usually a pretty straightforward conversation:

Person 1: How about X?
Person 2: Gross, I can’t believe you like that place.
Person 1: Ok, we could go to Y.
Person 2: I don’t have that much time, let’s go somewhere closer.
Person 1: Maybe Z?
Person 2: Perfect.

So all we need to do is replace Person 1 with a robot that has a great knowledge of local lunch spots. In this blog post we’ll walk through building Lunch Fox, our voice app that runs on both Alexa and Google Assistant devices and helps the office decide where to eat on a daily basis.

Building for Voice Assistants

With neither Alexa nor Google Assistant achieving total market dominance, in order to reach the most people (and please the most coworkers) it makes sense to build voice applications that support both platforms. Amazon provides a suite of development tools and APIs if you’re looking to build with the Alexa Skills Kit, and Google provides a similar set of tools with Dialogflow at the center if you’re looking to build for the Google Assistant.

You know what’s no fun? Having to build what is essentially the same thing twice. You know what else isn’t particularly fun? Using a GUI if you’re used to a text editor or other developer-centric environment. Unfortunately, if you want to rely primarily on Amazon’s or Google’s tooling you’ll be pushed in these directions. As most good developers would do, we went searching the Internet for a Better Way™. It didn’t take us long to find it.

Hello Jovo

Jovo is an open source framework to build voice apps for both the Google Assistant and Alexa from a single codebase. Apps are written in javascript, and Jovo provides a command line interface to manage projects from initialization and local testing, all the way through to deployment. When you need to tap into a feature or function that is specific to the Assistant or Alexa, Jovo makes that easy to do as well. Although it doesn’t directly support 100% of the functionality provided by Alexa and/or the Assistant, it covers more than enough for most voice use cases. They also have a Slack Community that is very active and extremely helpful.

The Source of Lunch Fox's Smarts

In order to provide recommendations we needed to find a data source that not only provided a comprehensive list of lunch locations but also allowed for us to do some filtering based on cost and/or distance. Google’s Place Search provided exactly this sort of data.

Using the request-promise-native library our API request ends up looking something like this:

const requestPromise = require('request-promise-native');

const options = {
    uri: 'https://maps.googleapis.com/maps/api/place/nearbysearch/json',
    qs: {
        keyword: 'lunch',
        location: '30.669889,-81.451439',
        radius: 10000,
        maxprice: 5,
        opennow: 'true',
        key: <your-api-key>
    },
    json: true
};

This will give us 20 lunch spots near the specified location, with the ability to page through results if more are needed.

Building With Jovo

Jovo has great documentation, and getting up and running with some boilerplate for us to work from was easy. We simply followed their installation docs and then cloned their Hello World template.

Voice applications, and more specifically those built with Jovo, work using the concept of “intents.” Intents are mapped to specific methods in your Jovo app, and you can define how those intents are accessed by users. Usually an intent will be accessed based on what the user says, and may include certain variables that you want to pass through to the method that handles the intent. You can find out more about how to define these phrases and variables using Jovo’s language model. Jovo also provides a number of helpful built-in intents such as LAUNCH and END that map to the entry and exit points of your voice app.

A Proof of Concept

Our first step since we were new to Jovo was to simply put together a working proof of concept for Lunch Fox. More particularly, we wanted the user to be able to invoke the voice application and then hear a random recommendation from a static area.

To do this we only needed to create a single custom intent to make our API request and then return the desired speech including the result to the user. The logic in our app.js file looked about like this:

app.setHandler({
  async LAUNCH() {
    return this.toIntent('GetRecommendationIntent');
  },
 async GetRecommendationIntent() {
   const googleApiKey = process.env.GOOGLE_API_KEY;
   const recommendation = await getLunchRecommendation(googleApiKey, '30.669889,-81.451439', 10000, 5);

  const randomRecommendation = recommendation.results[Math.floor(Math.random() * recommendation.results.length)];

  this.tell(`You should get lunch at ${randomRecommendation.name}`);
 }
});

async function getLunchRecommendation(apiKey, location, maxRadius, maxPrice, nextPageToken) {
  const requestPromise = require('request-promise-native');

  const options = {
      uri: 'https://maps.googleapis.com/maps/api/place/nearbysearch/json',
      qs: {
          keyword: 'lunch',
          location: location,
          radius: maxRadius,
          maxprice: maxPrice,
          opennow: 'true',
          key: apiKey
      },
      json: true
  };

  if (nextPageToken != null) {
    options["next_page_token"] = nextPageToken;
  }

  const lunchResults = await requestPromise(options);

  return lunchResults;
}

Trying It Out

Before we went any further with refinement we wanted to make sure Jovo actually did what it said it would do…namely, create something that could be deployed to both Alexa and the Google Assistant. If you’ve never worked with a voice app before, you have to do some initial setup in the Alexa Developer Console and/or Dialogflow. Generally, you’ll have to create a new app in each of these as a placeholder.

Once you have your skill/action setup at Amazon or Dialogflow, Jovo provides a built-in server that sets up a webhook that is routed to your local copy of the app, and you can quickly deploy a development version of your app to test things this way. To deploy you first need to create a build of your app, and then you can use the deploy command to upload your app. In theory this should work automatically, however we were never able to get the Dialogflow build to successfully upload and train. We had to import the zip that was generated manually instead. Jovo provides extensive tutorials for this entire process for both Alexa and Dialogflow that were extremely helpful.

Once we uploaded our voice app it was easy to test in a web browser using the tools provided by Amazon and Google.

Making it Personal

While the Southern Made team will almost always want lunch recommendations in Nashville, we wanted Lunch Fox to be something we could release for everyone to use. This meant we needed to be able to determine where the user (or more specifically, the user’s device) was when they used Lunch Fox. In order to get the location, we have to call a different method depending on whether or not the user is on an Alexa or Google Assistant device. Jovo makes this easy by providing the this.isAlexaSkill() and this.isGoogleAction() methods. Jovo also provides easy shortcuts to the specific permission request methods for both platforms. We knew we wanted to get the user’s location immediately on launch, so our resulting app logic ended up looking something like this:

app.setHandler({
  async LAUNCH() {
    if (this.isAlexaSkill()) {
      return this.toIntent('GetAlexaLocationIntent');
    } else if (this.isGoogleAction()) {
      this.toIntent('GetGoogleLocationIntent');
    }
  },
  async GetAlexaLocationIntent() {
    try {
      const googleApiKey = process.env.GOOGLE_API_KEY;
      const address = await this.$alexaSkill.$user.getDeviceAddress();
      const formattedAddress = (`${address.addressLine1} ${address.city}, ${address.stateOrRegion} ${address.postalCode}`);
      const geocodedAddress = await geocodeAddress(googleApiKey, formattedAddress);

      this.$session.$data.geoLocation = geocodedAddress;

      return this.toIntent('GetRecommendationIntent');
    } catch(error) {
      this.alexaSkill().showAskForAddressCard()
        .tell('Please grant access to your address in the Alexa app to get recommendations in your area.');
    }
  },

  GetGoogleLocationIntent() {
    this.$googleAction.askForPreciseLocation('To find lunch we need to know where you are');
  }

  ON_PERMISSION() {
    if (this.$googleAction.isPermissionGranted()) {
      let user = this.$googleAction.$user;
      let device = this.$googleAction.getDevice();

      let geocodedAddress = (device.location.coordinates.latitude + ',' + device.location.coordinates.longitude);
      this.$session.$data.geoLocation = geocodedAddress;

      return this.toIntent('GetRecommendationIntent');
    } else {
      return this.toIntent('GetGoogleLocationIntent');
    }
  }
});

For Alexa, we try to read the device address, and if permission is denied we use the showAskForAddressCard() method (provided by Jovo) to invoke the permission prompt on the user’s Alexa device/app. Once we receive the user’s address information we must pass it to a custom method that calls the Google Geocoding API since our recommendation lookup requires latitude and longitude.

For Google, we invoke the askForPreciseLocation() method (provided by Jovo) to request the required permission. The ON_PERMISSION callback alerts us when the user has provided us with some permissions, and we verify we have the latitude and longitude we need.

In both cases we store the resulting geolocation in the session using the built-in methods provided by Jovo. This allows us to reuse the geolocation across multiple requests from the same user. And once we store it we pass the user directly to our RecommendationIntent in order to provide them with a result.

Letting the User Refine Results

Some days you feel like eating on the cheap. Some days you want whatever is closest/quickest. We wanted to make sure Lunch Fox could allow users to account for these sorts of changing preferences.

The Google Places API outlined above gives us several levers to play with. Namely maxCost and maxRadius. Depending on the first result that was given to the user we wanted them to be able to say things like “that’s too expensive” or “that’s too far away”, allowing Lunch Fox to continue to narrow in on a great recommendation. In order to do that we need to be able to keep track of the user’s current cost and distance preferences within the session.

In order to allow a user to refine a preference, we also have to make sure we set initial defaults. And we wanted those defaults to be set no matter which intent the user entered through. We leveraged Jovo’s ON_REQUEST intent to set our defaults, and then added two new intents to handle requests for something “closer” or “cheaper.”

ON_REQUEST() {
  if (this.$session.$data.maxRadius == null) {
    this.$session.$data.maxRadius = 10000;
  }
  if (this.$session.$data.maxCost == null) {
    this.$session.$data.maxCost = 5;
  }
},
GetCloserRecommendationIntent() {
  const currentRadius = this.$session.$data.maxRadius;
  this.$session.$data.maxRadius = (currentRadius - 2000);

  return this.toIntent('GetRecommendationIntent');
},

GetCheaperRecommendationIntent() {
  const currentMaxCost = this.$session.$data.maxCost;
  this.$session.$data.maxCost = (currentMaxCost - 1);

  return this.toIntent('GetRecommendationIntent');
}

Each of our two new intents updates the session values for radius and/or cost, and then passes the user to our GetRecommendationIntent, which then uses those session values when making the Google Places API request. We mapped a number of triggering phrases to these two new intents to help give the most conversational experience possible.

Filling in the Gaps

In order to pass validation with Alexa (not to mention the fact it’s just a good idea) you have to provide users with the ability to close your app using “Stop” or “Cancel”, as well as give them helpful information if they say “Help”. Jovo provides a built-in intent to handle Stop/Cancel – but you will have to create your own intent for help. You can customize all of this in your Jovo config like so:

intentMap: {
   'AMAZON.CancelIntent': 'END',
   'AMAZON.StopIntent': 'END',
   'AMAZON.HelpIntent' : 'HelpIntent',
}

Hosting Lunch Fox

Up until this point we had been using the built-in Jovo server to fulfill all requests coming from the Google Assistant or Alexa. While that’s great for testing, it’s a total non-starter for something in production.

Jovo provides documentation for a number of hosting options. If you are targeting one platform in particular it often makes sense to use the built-in tooling for that platform, e.g. Lambda for Alexa or Google Cloud Functions for the assistant. At Southern Made we host many of our apps on Heroku – and we felt that was a good fit for Lunch Fox as well.

Deployment to Heroku was straightforward, your voice application deploys just as any node.js app would. When not using Lambda, Alexa will also require that you validate requests are actually coming from Alexa and not just someone randomly pinging your service. Jovo has put together a tutorial that addresses how to handle this. In our experience the tutorial didn’t necessarily work as advertised, however with some help from the community in the Jovo slack channel we were able to solve our issues.

This gave Lunch Fox a single backend on Heroku to easily service both Alexa and Google Assistant requests.

Submitting Your Skill/Action

We wanted to make sure others could use Lunch Fox as well, and this meant submitting our voice application for review on both the Alexa and Google Assistant platforms. Each service will walk you through what is required, but be prepared to fill out quite a bit of information about your voice application. You’ll need logos and descriptions, and you’ll have to select various categories as well as the countries in which your app will be available.

Although we’ve seen some reports of review taking significant time, we received reviews in < 24 hours on both services for both initial and follow-up versions.

Voice App Development at Southern Made

At Southern Made we continue to work with clients to deploy voice applications as well as find new and novel uses for digital assistants.

Sometimes a concept is simple – and we end up simply building directly within the Alexa Developer Console or Dialogflow itself. However when we want to do something more complex, especially something that requires data persistence, 3rd party API integration, and needs to be cross-platform, Jovo has proved to be a great tool for our development teams.

Have questions?
We have solutions.

Contact us