Randy Minder
Randy Minder

Reputation: 48392

Adding a column to a data frame with recoding

I'm going through a DataCamp class on dplyr. They had me load the 'hflights' data and then asked me to create a new column named 'Carrier', substituting each airline code with their actual name. The solution looks as follows:

    hflights <- tbl_df(hflights)

    names <- c("AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental",
             "DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways",
             "WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier",
             "FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa")

    hflights["Carrier"] <- names[hflights$UniqueCarrier]

I figured out how to do this, and this works, but it's not real clear to me exactly what R is doing here. I understand I'm adding a new column to the hflights data frame but I'm not clear on how (or why) R is substituting carrier codes for carrier names.

Upvotes: 1

Views: 318

Answers (2)

lmo
lmo

Reputation: 38500

This is a look up table where the names of a named vector are being used to return the values within that vector. To provide a couple of examples:

As a reminder, it is possible to subset a named vectors both by referring to the index or the name:

names[1:2]
        AA         AS 
"American"   "Alaska" 
names[c("AA", "AS")]
        AA         AS 
"American"   "Alaska" 

A nice feature is that these references can be repeated to produce an extended vector:

names[rep(1:2, 2)]
        AA         AS         AA         AS 
"American"   "Alaska" "American"   "Alaska"
names[rep(c("AA", "AS"), 2)]
        AA         AS         AA         AS 
"American"   "Alaska" "American"   "Alaska"

Using this method, it is possible to use a vector containing either indices of the look up table or names of the look up table to produce a vector of the same length, but with the desired values.

Upvotes: 3

Adam
Adam

Reputation: 668

names is a named vector of type character or string. This is similar to a Python dictionary, where each string indexes a variable. In this case, you index by the carrier code and the value is the full name.

In R, when you index a vector, you can do so with a list. In this case you are indexing the "dictionary" with the abbreviation codes and it returns a list the length of the index matching their values.

Upvotes: 2

Related Questions