I recently launched a side-project I had been working on for the last several months, Chord Assist. Whilst this guitar is made up of many different moving parts, at the centre is the brain of the conversational tool. Whilst I made use of dialogflow to build this functionality, I needed to make use of some server-side component to handle the logic of the conversational tool that could not be handled inside of dialogflow itself (known as fulfilment). For this I decided to use the Kotlin Client Library for Actions on Google!
Note: This project is completely open-source and can be found here. You can also view the client library details here.
For previous fulfilment requirements I made use of Firebase Functions and Javascript – this was a great approach and allowed me to easily get some backend logic implemented and deployed. Don’t get me wrong, I love Firebase Functions and enjoy writing javascript and if a tool gets a job done then it is fine to use by me! However, some languages allow you to write more concise and readable code – which is where Kotlin comes in. When I heard about the Kotlin Client Library for Actions on Google I got pretty excited to use it. I think this will be a more accessible entry point for many communities, not only due to a sense of familiarity but also due to the simplicity in implementation.
One thing that instantly comes to mind when comparing the two is the handling of asynchronous requests. Whilst using Promises in Javascript may not be a trivial task to all, the use of these within Dialogflow intents is not clearly documented. When using the Kotlin Client Library, you do not need to treat intents that handle asynchronous code differently – the response will be returned to the client when it is ready. Because a lot of fulfillments will be using async code, I feel this is a huge deal breaker in terms of simplicity when it comes to implementation.
As a result, the Kotlin Client Library provided a simple way for me to write more concise and readable code for my conversational fulfilment.
I hope this post inspires you to want to do the same! And with that said, let’s take a dive into how the Chord Assist conversational tool was built.
We start off with our application class, this class extends the provided DialogflowApp class. I don’t want to dive too much into it here, but it’s important to know that we must provide a class which does so as this is what will be used to handle the fulfilment of our conversational tool.
class ChordAssistApp : DialogflowApp() { }
If you jump into the source of the DialogflowApp class you’ll notice a few things that it gives us access to. To keep things simple in this post, the main thing that we’ll be making use of here is the getResponseBuilder() function. This is essentially a helper function that takes the current request from the current intent and uses the data within that to instantiate a response object with the required conversation data. At that point we have everything that we need to build a response, except for the content that is to be served to the user.
open class DialogflowApp : DefaultApp() { .... override fun getResponseBuilder(request: ActionRequest): ResponseBuilder { val responseBuilder = ResponseBuilder( usesDialogflow = true, conversationData = request.conversationData, sessionId = request.sessionId, userStorage = request.userStorage) return responseBuilder } }
Throughout this class we can now make use of this getResponseBuilder when handling each of our intents. Let’s jump on over to the next section to see how we can make use of this to serve a response to our user.
To begin with, your conversational tool is bound to have some form of textual content to display. For these pieces of textual content we’re not going to want to hardcode them into our project – this not only restricts us from localising our tool, but it will also makes it harder to maintain down the line. For this reason we’re going to make use of the ResourceBundle functionality, this is just a standard piece of functionality from Java.
private fun getResource(label: String): String { return ResourceBundle.getBundle("resources").getString(label) }
This ResourceBundle class allows us to retrieve a localised strings file for the textual content of our tool. The function that I call above uses the default locale that is supported by the tool, so in this case it is going to load a resources file called resources_en_US – we provided the resources prefix when we called the getBundle() function. There is also a getBundle function that allows us to pass in the prefix alongside a Locale instance, meaning that we could make use of the users locale to fetch the string for use.
Once we have the ResourceBundle we can then use getString() to fetch the desired string – when we call this we pass in the reference for the desired string. For example, in my resources file I have three strings labelled as:
label_open=open label_muted=muted label_fret=fret
Now with my helper function in place I can call getResource(“label_open”) which would return me back “open” for the en_US locale. I covered this first here as you’ll see this function used throughout the code in this article.
At this point we’re ready to define our first point of interaction with our conversational tool. Within our class we define each intent handler within a function, annotated using the @ForIntent annotation – with this we just need to provide the name of the intent and in this case we’re defining the welcome intent.
Each one of these intent functions must take an ActionRequest as a parameter and return an ActionResponse instance. The ActionRequest is used to provide us with all of the details regarding the current conversation – this includes things such as the user query, user details, contexts and so on. Once we’ve made use of anything that we require from that, we must create a new instance of a ResponseBuilder from the super class and modify it to our needs. We’ll then return this instance when we’re done handling the intent.
@ForIntent("welcome") fun welcome(request: ActionRequest): ActionResponse { val responseBuilder = getResponseBuilder(request) val response = getResource("welcome_response") responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response)) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_me_a_chord"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_available_chords"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_tune_guitar"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_what_can_you_do"))) } return responseBuilder.build() }
You’ll notice here that we begin by retrieving an instance of the response builder that we previously touched on – this is so that we can prepare the response ready for use by the device using the conversation. We begin by adding a simple textual response to our intent – all responses require a SimpleResponse so that devices that can’t handle rich content (anything that requires a screen / speaker) are able to fulfil the intent for the user. Within this SimpleResponse class we assign both display text (text that is to be shown to the user) and spoken text (text that is to be read out by the assistant) – these do not have to match if you wish to have the assistant only read out part of the content.
responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response))
Our welcome intent is pretty simple, so the only other thing we want to do is provide some suggestion chips to make the interaction with our conversation easier for the user. Before we do this though we need to begin by making use of the hasCapability from our ActionRequest to check if our device is capable of handling rich responses. Here we simply check the Capability.SCREEN_OUTPUT capability is available and if so, continue to add our rich response.
For this rich response we add what are known as Suggestions. These are chips shown at the bottom of the conversation that provide quick interaction points for the user – here we display 4 options to prompt the user onto the next part of the conversation. For this we use the Suggestion class and use the setTitle function to apply some textual content tot the chip.
responseBuilder.add( Suggestion().setTitle(getResource("suggestion_teach_me_a_chord")))
Now we are done, we are ready to return our response. Here we simply need to call build() on our ResponseBuilder and return the instance from our function.
return responseBuilder.build()
For the next intent, we’re going to handle the case where the user can ask what Chord Assist is capable of doing. This is a nice way of helping out the user without overwhelming them when they first trigger the conversation.
You’ll notice that this intent is pretty similar to the last one:
- Define a function using the @ForIntent annotation
- Retrieve a response builder instance
- Build the response with simple and rich content
- Return the response to the assistant
@ForIntent("available.tasks") fun availableTasks(request: ActionRequest): ActionResponse { val responseBuilder = getResponseBuilder(request) val response = getResource("available_tasks_response") responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response)) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_me_a_chord"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_available_chords"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_tune_guitar"))) } return responseBuilder.build() }
We again use this opportunity to build a collection of Suggestion chips for the user.
For the next intent we’re going to handle the request where the user wants to learn a chord.
As per the code below you will notice there are 3 entry points for this:
- The initial request to learn a chord – the learn.chord intent
- The repeat request for a chord – the learn.chord – repeat intent
- The fallback where something doesn’t go as planned – the learn.chord – fallback intent
@ForIntent("learn.chord") fun learnChord(request: ActionRequest): ActionResponse { return handleChordRequest(request, request.getParameter(PARAMETER_CHORD) as String? ?: "") } @ForIntent("learn.chord - repeat") fun repeatChord(request: ActionRequest): ActionResponse { return handleChordRequest(request, request.getContext(CONTEXT_LEARN_CHORD_FOLLOWUP)?.parameters?.get( PARAMETER_CHORD) as String) } @ForIntent("learn.chord - fallback") fun learnChordFallback(request: ActionRequest): ActionResponse { return handleChordRequest(request, request.getArgument(PARAMETER_CHORD)?.name ?: "") }
Things here mostly look the same, except for the second parameter in each of the handleChordRequest calls. You’ll notice that for the initial and repeat intents we make use of the getParameter function from the ActionRequest instance. This allows us to retrieve the entity value for which the user has given to our intent – in this case it will be a chord such as C or D. However, the learnChordFallback function has a slightly different approach via the use of the getArgument function – this is because we are handling a fallback intent so need to make use of the getArgument function on our ActionRequest instance.
Both of these approaches give us access to the requested chord so that we can continue to fulfil the request for the user.
Before we can go on to do so, we need to do some more setup in our class. I make use of Firebase within my project and this intent is going to make use of this. Here, I define a new function called getDatabase that will retrieve a firestore instance for use within the intent. At this point if the instance is not configured and initialised then that will be done here. This helps us to avoid any unnecessary reinitialisation as this code may be called many times during the lifetime of the conversation.
private fun getDatabase(): Firestore { if (FirebaseApp.getApps().isEmpty()) { val credentials = GoogleCredentials.getApplicationDefault() val options = FirebaseOptions.Builder() .setCredentials(credentials) .setProjectId(PROJECT_ID) .build() FirebaseApp.initializeApp(options) } return FirestoreClient.getFirestore() }
Now that we have access to our firestore instance, we can go ahead and make use of it! From our intent functions we used the provided data to call the handleChordRequest function – this is what is going to fetch the requested chord data and present it to the user.
private fun handleChordRequest(request: ActionRequest, chord: String): ActionResponse { val responseBuilder = getResponseBuilder(request) if (chord.isNotEmpty()) { val document = getDatabase().collection(COLLECTION_CHORDS).document(chord).get().get() val chordInstructions = buildString(document?.getString(FIELD_PATTERN) ?: "") + ". " responseBuilder.add( SimpleResponse() .setDisplayText(chordInstructions) .setTextToSpeech(chordInstructions) ) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add( BasicCard() .setTitle(getResource("learn_chord_title").format(chord)) .setImage( Image() .setUrl(document.getString("image") ?: "") .setAccessibilityText( getResource("learn_chord_title").format(chord)) ) ) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_repeat"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_another"))) } return responseBuilder.build() } else { responseBuilder.add(ActionContext(CONTEXT_LEARN_CHORD_FOLLOWUP, 5)) val response = getResource("learn_chord_unknown_response") responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response) ) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add(Suggestion().setTitle("Show me available chords")) } } return responseBuilder.build() }
Most of this function uses standard firebase interactions – I use firestore to build a query, execute it and fetch the data
val document = getDatabase().collection(COLLECTION_CHORDS).document(chord).get().get() val chordInstructions = buildString(document?.getString(FIELD_PATTERN) ?: "") + ". "
Once the data has been retrieved, we initially return a SimpleResponse to the conversation. Next, if the device has the screen output capability then we use the response builder to return a BasicCard instance. For this we set a title and provide an image of the chord. Using the Image class allows us to assign a hero image to our BasicCard instance that we load an image for using data from our firestore query.
responseBuilder.add( BasicCard() .setTitle(getResource("learn_chord_title").format(chord)) .setImage( Image() .setUrl(document.getString("image") ?: "") .setAccessibilityText( getResource("learn_chord_title").format(chord)) ) )
You may have also noticed that we again use Suggestion chips to provide further conversation interaction points for the user. Again, these are great to provide for most intents where a screen is available.
responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_repeat"))) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_another")))
Within the learn chord function you may have seen some functions which built strings for user output, I’ve provided these here just for brevity – they are not a part of the client library and just some standard kotlin code:
private fun buildString(sequence: String): String { var chordSequence = "" for (index in sequence.indices) { var note = chords[index] + " " + buildNote(sequence[index].toString()) if (sequence[index] != 'X' && sequence[index] != '0') note += " " + sequence[index] if (index < sequence.length - 1) note += ", " chordSequence += note } return chordSequence } private fun buildNote(note: String): String { return when (note) { "X" -> getResource("label_muted") "0" -> getResource("label_open") else -> getResource("label_fret") } }
Within the conversational tool there is also an intent which can be used to list the available chords to the user. The user might be new to guitar chords, or want to show the options that are available for them to learn.
This intent is similar to the previous, we fetch data from Firestore for display – the only difference is that we fetch all documents from the collection rather than querying for a specific chord.
@ForIntent("available.chords") fun showAvailableChords(request: ActionRequest): ActionResponse { val responseBuilder = getResponseBuilder(request) val query = getDatabase().collection(COLLECTION_CHORDS).get().get() val documents = query.documents val rows = mutableListOf<TableCardRow>() var text = "" documents.forEach { val displayName = it.getString(FIELD_DISPLAY_NAME) text += "$displayName, " rows.add(TableCardRow() .setCells(listOf( TableCardCell().setText(displayName), TableCardCell().setText(it.getString(FIELD_PACK))) )) } text = text.substring(0, text.length - 2) val response = getResource("available_chords_response") + text responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response)) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add( TableCard() .setTitle(getResource("available_chords_table_title")) .setSubtitle(getResource("available_chords_table_description")) .setColumnProperties( listOf(TableCardColumnProperties().setHeader( getResource("available_chords_table_chord_header")), TableCardColumnProperties().setHeader( getResource("available_chords_table_pack_header")))) .setRows(rows) ) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_me_a_chord"))) } return responseBuilder.build() }
Skipping past the stuff that we’ve already covered in this article, you can see that I’ve made use of the TableCard class. This class allows us to define a collection of TableCardRow instances – where each defines a TableCardCell which text to be shown within the table.
val rows = mutableListOf<TableCardRow>() rows.add(TableCardRow() .setCells(listOf( TableCardCell().setText(displayName), TableCardCell().setText(it.getString(FIELD_PACK))) ) )
At this point we have our rows, but we need to place them within a table. For this we use the TableCard, assign some details to it and then set its column properties – this is the point where we assign the number of columns that our table has. Here, each TableCardColumnProperties instance defines a table header. We then use the setRows function to make use of the row items that we previously created.
responseBuilder.add( TableCard() .setTitle(getResource("available_chords_table_title")) .setSubtitle(getResource("available_chords_table_description")) .setColumnProperties( listOf( TableCardColumnProperties() .setHeader(getResource("available_chords_table_chord_header")), TableCardColumnProperties() .setHeader(getResource("available_chords_table_pack_header")))) .setRows(rows) )
We again make use of the Suggestion class here to provide a way for the user to easily learn another chord.
responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_teach_me_a_chord")))
As well as learning chords, we also provide the ability for the user to tune their guitar – for this we need to make use of audio files. When the user requests to tune their guitar, we ask the note they want the assistant to play which we then present to them in an audio format.
@ForIntent("play.note") fun playNote(request: ActionRequest): ActionResponse { val responseBuilder = getResponseBuilder(request) if (!request.hasCapability(Capability.MEDIA_RESPONSE_AUDIO.value)) { val response = getResource("error_audio_playback") responseBuilder.add( SimpleResponse().setDisplayText(response).setTextToSpeech(response) ) return responseBuilder.build() } val chord = request.getParameter(PARAMETER_NOTE) as String val document = getDatabase().collection(COLLECTION_NOTES).document(chord).get().get() val input = document?.get(FIELD_NAME) val inputResponse = getResource("play_note_title").format(input) responseBuilder.add( SimpleResponse().setDisplayText(inputResponse).setTextToSpeech(inputResponse) ) val audioResponse = document?.get(FIELD_AUDIO) responseBuilder.add( MediaResponse() .setMediaType("AUDIO") .setMediaObjects( listOf( MediaObject() .setName(inputResponse) .setContentUrl(audioResponse as String) ) ) ) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_play_another_note"))) } return responseBuilder.build() }
The first check that you might notice above is the surface capability check. This intent is pretty useless unless the device has audio capability, so we perform a check here and let the user know that this is required.
if (!request.hasCapability(Capability.MEDIA_RESPONSE_AUDIO.value)) { }
After we perform the fetching of data from firestore (this works the same as the previous examples in this post) we make use of the MediaResponse class to build the response to be presented to the user. Here we are required to set the type of media that is being used and attach a MediaObject instance to our MediaResponse. We’ve already stated that our media type is AUDIO, so here we provide a name to be displayed on the card along with the the URL for the content to be played.
responseBuilder.add( MediaResponse() .setMediaType("AUDIO") .setMediaObjects( listOf( MediaObject() .setName(inputResponse) .setContentUrl(audioResponse as String) ) ) )
And as per the previous intents, we again provide a Suggestion chip to provide an easy way for the user to continue their conversation.
responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_play_another_note")))
Our last intent is one to handle the completion of the audio player. When using audio responses in actions on google, the handle.finish.audio intent must be implemented otherwise an error will be thrown with the response. All we do here is acknowledge the completion and offer the ability to repeat the previously played note or for another to be played.
@ForIntent("handle.finish.audio") fun handleFinishAudio(request: ActionRequest): ActionResponse { val note = request.getContext(CONTEXT_NOTE_FOLLOWUP)?.parameters ?.get(PARAMETER_NOTE) as String val responseBuilder = getResponseBuilder(request) val inputResponse = getResource("audio_completion_response") responseBuilder.add( SimpleResponse().setDisplayText(inputResponse).setTextToSpeech(inputResponse) ) if (request.hasCapability(Capability.SCREEN_OUTPUT.value)) { responseBuilder.add(Suggestion().setTitle("Repeat $note")) responseBuilder.add(Suggestion() .setTitle(getResource("suggestion_play_another_note"))) } return responseBuilder.build() }
In this article we’e taken a dive into the Kotlin Client Library for actions on Google, looking at how it’s been used to build a real-world production conversational tool. The client library is fairly new, but it already offers a range of functionality to create conversational tools of your own.
In my next article I plan on looking at the entirety of the library, and in future we’ll look at how we can handle account linking and transactions using the client library. In the meantime, if you have any questions then feel free to reach out 🙂