Cutting through the smog: making an air quality bot with Haskell

In this tutorial, I want to show you how a chatbot can help you reduce your pollution intake, and how to build one in Haskell using the line-bot-sdk. This tutorial assumes some familiarity with Haskell.

Short and long-term exposure to air pollution can result in significant health problems. When air quality is considered unhealthy we should avoid certain activities, which bears asking: how to get notified when air quality is poor? this post is an attempt at solving this.

The idea behind this chatbot is simple: users share their location, and the bot reads publicly available pollution data from local monitoring stations and if the air is unhealthy we push a message to the users to let them know.

Why using a chatbot?

Chatbots allow users to have a smooth interaction with your service, and it’s relatively easy to intergrate a chatbot with messaging apps.

Using a chatbot also has the additional benefit that not only users but also groups can interact with your chatbot. For example, in this tutorial, you will learn how to simply drop a bot into your family chatroom and after that you and your relatives will get promptly notified when air quality is poor in the places you care about.

We are going to be using the line-bot-sdk, a Haskell SDK for the LINE messaging platform. You can read an overview of the LINE Messaging API here.

This blog post was generated from literate Haskell sources. For those who prefer to read the code, an extraction version can be found here.

The line-bot-sdk uses the servant framework, a set of packages for declaring web APIs at the type-level. Here are the GHC language extensions we need for this example to work:

Imports

Parsing measurement data

The Taiwan’s Environmental Protection Administration monitors air pollution in major cities and counties across Taiwan. They have a public API with the latest registered air pollution:

This API returns a JSON array with measured data from all the monitoring stations in Taiwan, typically updated every hour.

An AQI number under 100 signifies good or acceptable air quality, while a number over 100 is cause for concern. Among the reported pollutants there are particulate matter, ground level ozone, carbon monoxide and sulfur dioxide.

We will additionally need the location of the measurement, which we will use to find the closest available data to our users:

However, first we need to do some data preprocessing:

  • note that all the JSON fields are strings, but our AQData type requires numeric values
  • there are some data points missing relevant details, such as the AQI or the location:

So we need to remove such data points, since they amount to noise. One possible way to do this, is by wrapping [AQData] in a newtype:

And then provide an instance of the FromJSON class to decode and filter bad values:

However, there is another possibility. FromJSON has another method we can implement:

Array items go through parseAQData. Here the MaybeT monad transformer produces a value only if all items are present:

We then use catMaybes :: [Maybe a] -> [a] function to weed out the Nothings and return a list of AQData. Now that we have a FromJSON instance, we can write a client function to call this API:

Here we only intercept exceptions of type HTTPException. For simplicity we just retry if the request fails, in practice you should inspect the error and implement retries with exponential backoff.

Distance between two geo points

We want our bot to notify users of unhealthy air in the regions where they live and work, so first we need to know which monitor is the closest to the users. For that, we will use the harvesine formula, which determines the great-circle distance between two points on a sphere given their longitudes and latitudes.

First let’s define a type alias for latitude/longitude pairs (in degrees):

With distance we can calculate the distance in kilometers between any two given geo points. Now only reminds extract the air quality data point that is closest to a given location:

minimumOn :: Ord b => (a -> b) -> [a] -> a is defined in the package extra.

App environment

Most of the computations we are going to define require reading values from a shared environment:

This way we can pass around the channel token and secret, and the list of users, which are represented as (Source, Coord). Source is defined in Line.Bot.Webhook.Events and it contains the Id of the user, group or room where push messages will be sent.

The user list will be concurrently read and updated from different threads, so we store it here in a mutable variable, using Control.Concurrent.STM.TVar from the stm package1.

We are going to use mtl type classes instead of a concrete monad transformer stack for this tutorial2, with functions being polymorphic in their effect type. One benefit of this approach is that type constraints cleary express (and enforce) which effects can take place, and as a bonus will give us more options for composition.

Handling webhook events

When an event, such as when our bot joins a chatroom, an HTTP POST request is sent to our registered webhook URL with the channel. Here we are interested in three types of events (other events are just ignored):

  • when our bot is added as a friend (or unblocked)
  • joins a group or room
  • receives a location message from a user

For the first two, we reply with a text message that contains a quick reply button, with a location action: this allows the users to easily share their location for air monitoring.

We are using Line.Bot.Types.ReplyToken, which is included in events that can be replied:

Line is the monad to send requests to the LINE bot platform.

MessageText is a data constructor from Line.Bot.Types.Message. All messages can be sent with an optional QuickReply; Quick replies allow users to select from a predefined set of possible replies, see here for more details on using quick replies.

Finally runLine runs the given request with the channel token from the environment3:

Once we receive a location message event, we add the user and her location to the shared list of users:

We add the source of the event, so if the message was sent from a group, we will notify the group, not the user who shared the location.

Note that for a real-world chatbot you should handle dual events like unfollow or leave.

Serving the webhook: WAI application

To serve our webhook API we need to produce a WAI app.

The line-bot-sdk exports a type synonym defined in Line.Bot.Webhook that encodes the LINE webhook API:

The LineReqBody combinator will validate that incoming requests originate from the LINE platform.

Servant handlers run by default in the Handler monad. In order to let our webhook handler to read the environment Env (which is enforced by the type constraint in webhook) we are going to stack the Reader monad.

It is beyond the scope of this tutorial to cover the nuts and bolts of the Servant web framework, which are well-covered in e.g. the servant tutorials.

The final step is to turn our aqServer into a WAI Application.

Servant allows to pass values to combinators by using a Context. The LineReqBody combinator requires a Context with the channel secret. This is enforced by the type-level list '[ChannelSecret].

Periodic updates

We previously defined getAQData, which is an IO action that returns the list of (valid) data points. Our goal now is to call this API every hour to get the latest measured data and map it to our users, based on the location:

processAQData does serveral things4:

  • read the list stored in the transactional variable users in the environment: if the list is empty, retry, blocking the thread until users are added
  • call getAQData to get the most recently available air quality data
  • we then run a list comprehension where we map each user, of type (Source, Coord) to (Source, AQData)
  • for each user, call notifyChat

To alert users, we will push messages to those users whose closest monitoring station reports an AQI over 100:

processAQData needs to be called periodically, at least once every hour. We will run it in a separate thread, so that it runs concurrently with our webhook server:

Air quality alerts

To inform users of pollution levels, we will use a Flex Message, which are messages with customizable layouts written in JSON format.

line chat

A Flex message is constructed from an alternative text (for clients not supporting the feature), a Data.Aeson.Value which contains the message layout and content, and an optional quick reply:

To design the layout of the alert message we used the Flex Message Simulator. We will use the JSON quasiquoter aesonQQ, which converts (at compile time) a string representation of a JSON value into a Value:

flexContent :: AQData -> Value
flexContent AQData{..} = [aesonQQ|
  {
    "type": "bubble",
    "styles": {
      "footer": {
        "separator": true
      }
    },
    "header": {
      "type": "box",
      "layout": "vertical",
      "contents": [
        {
          "type": "text",
          "text": "AIR QUALITY ALERT",
          "weight": "bold",
          "size": "xl",
          "color": "#ff0000",
          "margin": "md"
        },
        {
          "type": "text",
          "text": "Unhealthy air reported in your area",
          "size": "xs",
          "color": "#aaaaaa",
          "wrap": true
        }
      ]
    },
    "body": {
      "type": "box",
      "layout": "vertical",
      "contents": [
        {
          "type": "box",
          "layout": "vertical",
          "margin": "xxl",
          "spacing": "sm",
          "contents": [
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "County",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{county},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "Status",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{status},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "AQI",
                  "weight": "bold",
                  "size": "sm",
                  "color": "#ff0000",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show aqi},
                  "weight": "bold",
                  "size": "sm",
                  "color": "#ff0000",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "PM2.5",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show pm25},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "PM10",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show pm10},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "O3",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show o3},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "CO",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show co},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            },
            {
              "type": "box",
              "layout": "horizontal",
              "contents": [
                {
                  "type": "text",
                  "text": "SO2",
                  "size": "sm",
                  "color": "#555555",
                  "flex": 0
                },
                {
                  "type": "text",
                  "text": #{show so2},
                  "size": "sm",
                  "color": "#111111",
                  "align": "end"
                }
              ]
            }
          ]
        }
      ]
    },
    "footer": {
      "type": "box",
      "layout": "horizontal",
      "contents": [
        {
          "type": "button",
          "action": {
            "type": "uri",
            "label": "More info",
            "uri": "https://www.epa.gov.tw/"
          }
        }
      ]
    }
  }
|]

Putting it all together

We are almost done! The only remaining part is to run our server and main loop:

  • We read from the environment the channel token and secret
  • create an initial Env.
  • thread the inital environment to our app and loop.
  • call Network.Wai.Handler.Warp.run to run the webhook in port 3000

Here you can see we are actually instantiating loop and app to concrete monads.

Wanna be friends?

If you live in Taiwan, you can follow this exactly same bot and see how it works:

line chat

Conclusion

In this tutorial we have covered the development of a simple but practical chatbot. I hope you enjoyed reading (and perhaps coding along) and maybe help you getting started with your own chatbot ideas!


  1. Bear in mind that in this example the list of users will not be persisted across restarts or crashes; in a production environment you should use a database to store users.

  2. The so-called mtl-style programming

  3. Note that in order to keep this tutorial concise, we are not checking for possible errors, but you should pattern match the result of runLine. Line has an instance of MonadError ClientError so you can catch errors there, too.

  4. The naive solution we are building here to associate users to monitoring stations would be impractical for a real application, where it would make more sense to filter out those data points where pollution is of concern, and then for each data point retrieve all users that are within a given distance e.g. using a geospatial index, and notify them with multicast messages.