Javascript Chatterbot Workshop

Home | The Bot Shell | Programming Your Bot

Programming Your Bot

The process of programming your chatterbot boils down to compiling a list user statements paired with systems responses. The statements and responses are coded using a pattern matching technique known as regular expressions (or regex for short). Regular expressions are ideally suited to handle the ELIZA type reflections we will need to convincingly simulate conversation.

Regex coding is a vast area, but fortunately we can accomplish all of our introductory goals with just a handful of methods.

The important thing to understand about regular expressions is that they enable you to search and process text based on a combination of fixed data and variable data. For example:

"I love New York."

could be coded as a regex and generate a response whenever the user typed the sentence "I love New York." However, we can substitute wildcard symbols for fixed data and give our system considerably more flexibility. Suppose we replaced "New York" in our previous example with the regex pattern ".*" - the resulting new pattern being

"I love .*"

Since ".*" is the regex equivalent of "anything", our new code would not only respond to "I love New York." but "I love ice cream." and "I love you." as well.

Believe it or not, you actually now know enough about regular expressions to write a few statement and response pairs for your bot.

Since a literal statement with no wildcards is a perfectly valid regex,

"I love New York."

could be setup as pattern to match against user input. A suitable bot response to such a statement might be:

"I hear it's a heck of a town - but of course I've never been there!"

To program your bot to make that response, we'll need to add those two regex patterns to data used by the shell.

In the bot shell you will find a section that begins:

//----Data Declarations----

  var convpatterns = new Array ( 

The array which follows contains all of the statement and response patterns that make up the bot personality. The array takes on the general form:

var convpatterns = new Array ( 
    new Array  ("A regular expression to match against the input.","A regular expression to build a reply"),
    new Array  ("Another regular expression to match against the input.","Another regular expression to build a reply")

Each line that begins new Array = contains a statement, response pair. The regex patterns are enclosed in quotation marks and separated by commas. Each line in the array ends in a comma - with the exception of the very last entry in the array which ends with a close parenthesis. To get our new data into the mix, we'll simply add a line in that same format:

    new Array  ("I love New York.","I hear it's a heck of a town - but of course I've never been there!"),

I'd encourage you to take few moments and actually work through the example, editing and saving the code, and then giving it a try to make sure it works. Then pat yourself on the back because you're officially a chatterbot programmer!

Now let's return to the notion of wildcards, and see how we can start to jazz up our bot a bit. In the introduction we discussed a trigger example where the user typed any sentence containing the words "science fiction" and the system responds "Who is your favorite science fiction author?" If you already know something about regular expressions, you'll realize that a regex constructed as "science fiction" would in fact match all sentences containing those words. However, the way the bot shell is setup, we will need always to make regexes that match the entire input. To effect this match we can surround "science fiction" with wildcards thusly: ".*science fiction.*" . The new line for our data array to add this exchange would be:

    new Array  (".*science fiction.*", "Who is your favorite science fiction author?"),

You'll notice if you look at some of the data in the default shell that you can have more than one reply per user statement. In fact you can have as many replies as you want and the reply will be selected at random. This is an excellent way to get a lot of bang for your coding buck, and to get a better natural language simulation. So let's add a couple of additional science fiction responses:

    new Array  (".*science fiction.*", "Who is your favorite science fiction author?",
                         "Have you seen any good SF movies lately?","Some people think I'm a bit far out myself!"),

Our next step will be to write a regular expression that performs the ELIZA style reflection and uses parts of the user input to generate a response. Before getting into the technical side, let's go back and consider the big picture for a moment. Suppose we want to respond to any user input that begins with "I'll never" with a response that begins with "Perhaps someday you will" and then echoes back the phrase entered by the user. So "I'll never see Paris." generates the response "Perhaps someday you will see Paris." and "I'll never get that promotion." similarly generates the response "Perhaps someday you will get that promotion."

We already know how to use a wildcard to generate a match for any text the user enters. Regular expressions provides an additional functionality to capture the wildcard match so that we can use it later. In regex terminology it's called grouping, and it only requires enclosing the group in parentheses, as in:

"(.*)"

The full regex for this match would be:

"I'll never (.*)"

Regular expressions keeps track of your groups and numbers them from 1 to 9. To insert the match into our response requires a method called making a backreference and is coded using the dollar sign "$" in front of the group number. Since our example contains only one group, the number we are matching must be 1 - so we refer to it as:

"$1"

Any place we put "$1" in the regular expression used for the response will insert the match from that group. So our response string in this example would be coded as "Perhaps someday you will $1". The entry in our data array would look like this:

    new Array  ("I'll never (.*)","Perhaps someday you will $1"),

You might take a moment to add this line to your bot and see how it works.

Now I'll have to confess that I cheated a little bit to make that example simple. The match in this case is going to include the punctuation at the end of the user input - most likely a period. For this match it's OK because we've placed the captured phrase at the end of the reply sentence. However, if I wanted to place the match in the middle of the reply - as in "You can $1 if you want it badly enough." the way things are setup now an extra period would appear in the middle of our reply, "You can get that promotion. if you want it badly enough." -- oops. We have to fix our method so the regex doesn't include the punctuation in "$1".

Specifying the punctuation turns out to be a little tricky. We've seen that Regular expressions makes use of a number of common symbols which include

*^? ) (+ $.

You may see these symbols referred to as metacharacters. When metacharacters are used in a regular expression they take on special functions. However there are times when you will want the literal character to simply stand for itself rather than the function indicated by the metacharacter. In other words, sometimes a question mark is just a question mark. To indicate that you intend the literal character, you would precede it with a backslash:

"\"

as in:

"What's for dinner\?"

This process is called escaping the metacharacter.

Now we can go back and rewrite our example:

"I'll never (.*)\."

and to balance the books we'll need to add the period to our first response:

"Perhaps someday you will $1."

Here is the new line for the data array with both possible matches:

    new Array  ("I'll never (.*)\.","Perhaps someday you will $1.","You can $1 if you want it badly enough."),

In summary we now have three techniques we can use to create three types of input response pairs: a literal match, a keyword trigger, and a conversational transformation. We also have a model than can be expanded indefinitely. Bringing it all together, here is a sample array for the bot we "built" while working through our three examples:

var convpatterns = new Array ( 
    new Array  ("I love New York.","I hear it's a heck of a town - but of course I've never been there!"),
    new Array  (".*science fiction.*", "Who is your favorite science fiction author?",
                      "Have you seen any good SF movies lately?","Some people think I'm a bit far out myself!"),
    new Array  ("I'll never (.*)\.","Perhaps someday you will $1.","You can $1 if you want it badly enough.")