What is an API?

API stands for Application Program Interface, which may sound complicated, but all it is is a way for different programs to talk to each other. In this tutorial, we will go over a particular subset of APIs called RESTful APIs, which is simply a design pattern or type of APIs that is designed to be easy to use.

But before we go over RESTful APIs in particular, we need to define what makes an API an API. Prepare yourself for some jargon.

So using the technical language we have learned above, we can think of an API as a way for a client to pull resources from a server.

How Can We Interact with an API?

When we use an API to interact with a resource, we are making a request to a server. There are many different types of requests that we could make, but we will go through the two most common here.

  • GET: This request is for getting information from a server about a resource. As far as data analysis goes, this is likely the only way we will be interacting with a server
  • POST: This request is for posting information to a specified resource. For example, you could post a tweet using an API. This is how Twitter bots operate.

Let’s do our first GET request now, but using our browser as a client instead of code. Click this link. You should see a bunch of unintelligible text. This is the response from our GET request. What’s more important (and hopefully a bit demystifying) is the url. Let’s take a closer look:

https://pokeapi.co/api/v2/pokemon/ditto

What we pulled back was some information about the pokemon ditto, and as you can see, the last part of that url is the word ditto. What if we were to change this to pikachu. In that case, we would get information back about pikachu.

The best part of APIs is that they use a common design pattern to make different requests. That way requesting on pikachu versus ditto only means changing one thing.

What is a RESTful API?

REST stands for REpresentional State Transfer. Make sense? No? We can break that down further.

The state of resources on the web is constantly changing. How many followers a Twitter user has, for example, changes all the time. What a RESTful API does then is provide a representation of the state of a particular resource at any given moment. If the resource is a Twitter user, then the state of the resource could be things like its follower count, number of tweets, or favorited tweets.

Querying APIs in R

jsonlite

APIs often return data in the form of JSON files, which can be thought of as akin to lists in R. When this is the case (and when life is easy and you don’t need to authenticate), you can use jsonlite to directly access the results.

library(jsonlite)

# this will return a list with information about ditto
ditto = fromJSON("https://pokeapi.co/api/v2/pokemon/ditto")

ditto$species$name
## [1] "ditto"

API Tokens

APIs need a mechanism to control who accesses their data and how many and what resources a user can access. This mechanism can exist in a few different forms. For example, the pokemon API restricts how many queries a user can execute to 100 per minute.

Another common restriction is tokens. Tokens are something like a password to use with any given API. Later in this tutorial, you will need a token in order to access OMDB’s API.

Like any other password, you really shouldn’t share it in a public forum like GitHub (it's typically against the ToS as well). To get around this, we will store it in a variable called key. So I don’t have to keep manually entering key, I typically create it in a separate script and put that script in the .gitignore. This is just a special file that git will always ignore.

Now that we have an API Key, let's take a look at OMDB’s url to see how it fits in:

http://omdbapi.com/?apikey=<<YOUR KEY GOES HERE>>&t=<<MOVIE TITLE GOES HERE>>

Pretty simple. Put your API key after the part that says ?apikey=.

Problem 1: Building a Pokemon Dataset

Problem

Using the Pokemon API from above, build a dataframe that includes the name, height, weight, base experiance, and primary type for at least four pokemon.

Hints

  • Use fromJSON() to pull back
  • Use a for loop to loop through your chosen pokemon
  • paste0() can be used to change the end point of your API

If you are getting stuck, I would recommend pulling back just a single pokemon and then looking for the information that you need.

Solution

There will definitely be many different ways to accomplish this, but I will use a combination of a for loop, append(), and paste0()

library(jsonlite)
library(kableExtra)
library(dplyr)

pokemon = c("pikachu", "ditto", "eevee", "charizard")

# initialize outside of for loop
names = c()
height = c()
weight = c()
base_experience = c()
primary_type = c()

# loop through different pokemon
for(i in 1:length(pokemon)){
  
  # this will return a list with information about ditto
  data = fromJSON(paste0("https://pokeapi.co/api/v2/pokemon/", pokemon[i]))
  
  # append will add the result to the end of the vector
  names = append(names, data$species$name)
  height = append(height, data$height)
  weight = append(weight, data$weight)
  base_experience = append(base_experience, data$base_experience)
  primary_type = append(primary_type, data$types$type$name[1])
}

# data will compile the vectors into a dataframe
pokemon_df = data.frame(
  names,
  height,
  weight,
  base_experience,
  primary_type
)

kable(head(pokemon_df), format = "html")%>%
  kable_styling("striped")
names height weight base_experience primary_type
pikachu 4 60 112 electric
ditto 3 40 101 normal
eevee 3 65 65 normal
charizard 17 905 240 flying

Problem 2: Making a Movie Dataset

Problem

The Open Movie Database has an API that allows users to pull data on up to 1,000 films a day for free. After you receive the API key in an email, activate the key following the link in the email. Then make a dataframe that contains the title, year, MPAA rating, runtime, genre, and actors for a few of your favorite movies!

Solution

library(jsonlite)
library(kableExtra)
library(dplyr)

films = c("Batman+Begins", "The+Incredibles", "The+Godfather", "Parasite")

 titles = c()
 years = c()
 ratings = c()
 runtimes = c()
 genres = c()
 actors = c()
 
 
for(i in 1:length(films)){

  data = fromJSON(paste0("http://omdbapi.com/?apikey=", omdbkey,"&t=", films[i]))
  
  titles = append(titles, data$Title)
  years = append(years, data$Year)
  ratings = append(ratings, data$Rated)
  runtimes = append(runtimes, data$Runtime)
  genres = append(genres, data$Genre)
  actors = append(actors, data$Actors)
  }

 film_df = data.frame(
   titles,
   years,
   ratings,
   runtimes,
   genres,
   actors
)

kable(head(film_df), format = "html")%>%
  kable_styling("striped")
titles years ratings runtimes genres actors
Batman Begins 2005 PG-13 140 min Action, Adventure Christian Bale, Michael Caine, Liam Neeson, Katie Holmes
The Incredibles 2004 PG 115 min Animation, Action, Adventure, Family Craig T. Nelson, Holly Hunter, Samuel L. Jackson, Jason Lee
The Godfather 1972 R 175 min Crime, Drama Marlon Brando, Al Pacino, James Caan, Richard S. Castellano
Parasite 2019 R 132 min Comedy, Drama, Thriller Kang-ho Song, Sun-kyun Lee, Yeo-jeong Jo, Woo-sik Choi

Conclusion

APIs are designed to be a stable way for scripts, programs, and applications to interact with each other. In general, if you are text mining, you will just need to GET requests from an API as you are hoping to get data.

Since most data in APIs comes in the form of JSON files, we went over the fromJSON() function in jsonlite. However, R can do every kind of request using the httr package.