What day should we have a meetup? A httr story

This is going to be the shortest and fastest blogpost I’ve written so far. Or so I’m telling myself as I’m writing this, one can always hope.

In the not so distant past I gave a lightning talk at the BirminghamR meetup. The talk was about the httr package and how you can use it to extract data from an API. This blogpost is the written form of that talk minus the jokes (they don’t work when written down).

A reason to live

Why would I have a lightning talk about a package you’ve never heard of? Because I had a problem (among many others). The BirminghamR meetup usually takes place on a Thursday but some attendees suggested we try a different day of the week. To decide which day would be best I wanted to base the decision on data. Specifically data about when other meetups are held within the Birmingham area. Most of the meetups, the R one included, use Meetup to organise events. And it just so happens to be that Meetup have an API.

Before I dive into httr I should probably explain what an Application Programming Interface (API) is. In its most general form an API is a communication protocol between computer programs. More specific to my use case, an API is way to access data through websites. Now, httr is httr because it makes use of the HTTP protocol. This means that communication is done through a set of rules and regulations.

Compare it to a restaurant. You (a computer program) want to place an order from the menu (the data) with the chef (a database). The waiter (another computer program) is your API delivering your order to the chef and bringing the data to you. Or in other words, The Matrix was a movie about APIs fighting over the data.

Back to httr, the HTTP protocol defines a number of so-called verbs which allow you to perform actions or “communicate” with the program on the other side. The most common verbs are GET and POST. In httr these verbs are functions: GET() and POST().

So much text and we haven’t even gotten to the code yet. Bare with me for a moment, I still have to explain the Meetup API. A good API provides you with plenty of documentation about the type of data you can get, how to get it and what the response looks like. And the Meetup API is a very good API. The response consists of two parts, a status code and the actual data. The status code tells you if your request was correct or not, i.e. did you order something that wasn’t on the menu.

API: Application Pain In the …

Now that we’re both experts in APIs we can get back to the original problem. So to find out when other meetup events took place we can query the /:urlname/event endpoint. This endpoint however requires us to know the name of the meetup (urlname) so first we need to find all the groups in the Birmingham area. We can do so using the /find/groups endpoint as follows:

library(httr)
response <- GET("https://api.meetup.com/find/groups",
                query = list(location='Birmingham, UK', 
                             radius=10, category=34))

The GET function takes an API endpoint and a list of parameters (as key/value pairs). The API documentations will tell you what valid parameters are. In this case, we want to find all groups in the Birmingham area (location=Birmingham,UK) within a 10 mile radius (radius=10) and within the Tech category (category=34).

Before we browse through the delicious data we retrieved, we have to make sure our request was correct.

http_status(response)$reason
## [1] "Bad Request"

A “Bad Request” can mean many things but in this case a little digging revealed that this particular endpoint requires authorization. That is topic beyond the scope of this blogpost and one I don’t have time to go into. As a quick workaround I just used the console provided by the Meetup API (whilst logged in) and copy/pasted the JSON output.

## $urlname
## [1] "Silicon-Canal-Tech-Drinks"
## 
## $urlname
## [1] "Agile-West-Midlands"
## 
## $urlname
## [1] "brum-ai"
## 
## $urlname
## [1] "meetup-group-MzfqIqCy"
## 
## $urlname
## [1] "tech-wednesday"
## 
## $urlname
## [1] "BirminghamR"

It may not be perfect but it allows us to continue getting the events. So using the name of the groups we query the events endpoint:

response <- GET("https://api.meetup.com/BirminghamR/events", 
                query = list(no_earlier_than='2018-01-01T00:00:00.000', 
                             status = 'past'))
parsed <- content(response, "parsed")
parsed[[1]]$local_date
## [1] "2018-01-25"

The response of an API can be in many formats (CSV, XML, JSON) but this is where httr helps us out. Using the content function with the “parsed” argument we can convert the JSON to a list. Each event is an item in this list and in this case we extract the date of the event.

We repeat the same for all the groups to obtain a dataset of event dates. Since we’re dealing with lists, purrr is the tool of choice.

library(purrr)
library(glue)

# define a function to do the heavy lifting
get_events <- function(meetup_name,
                       base_url="https://api.meetup.com/{group}/events"
                       ){
  Sys.sleep(1)
  group <- meetup_name
  r <- GET(glue(base_url), 
           query = list(no_earlier_than='2018-01-01T00:00:00.000', status = 'past'))
  parsed <- content(r, "parsed")
  event_dates <- map_chr(parsed, "local_date")
  return(event_dates)
}

all_dates <- map(meetup_groups, get_events) %>% unlist
all_dates <- data.frame(date=all_dates)
##         date
## 1 2018-01-04
## 2 2018-02-01
## 3 2018-03-01
## 4 2018-04-05
## 5 2018-05-03
## 6 2018-06-07

Oh Happy Days

Alright then, we’re finally at the point where we want to be: making pretty graphs. First off, what was the most popular day?

And the answer to our question is that Wednesday was the busiest day. Weekends are obviously less busy but that’s expected. So what is left is Monday or Friday. Who wants a meetup on Monday?

Because making pretty graphs never ceases to be fun, here’s one more:

Fin

I went through a lot of trouble to answer a simple question. I actually went through too much trouble. Often when it comes to well known APIs someone else has already done the heavy lifting and written a wrapper package. The wrapper package doesn’t require you to know anything about the API itself, no endpoints, no hassle with status codes. And the nice thing is that usually you get the results back in a neat data.frame. The wrapper package for the Meetup API is called meetupr and is maintained by RLadies.

If you want more information about httr check out https://httr.r-lib.org. And if you’re a sadist and want to write your own wrapper package using httr be sure to check out https://httr.r-lib.org/articles/api-packages.html.

ps. I did it! This was the fastest blogpost I’ve ever written :-D