Reputation: 1083

Index/matching based on two columns in R, is it possible?

I have the following "index" list that contains the name of six types of cars and their ID associated (DF1).

DF1 = structure(list(Car = c("Toyota", "Mitsubishi", "Audi", 
"Merecedes", "Ford", "Fiat"), ID = structure(c(1L, 
2L, 3L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", "5", 
"6"), class = "factor")), .Names = c("Car", 
"ID"), row.names = c(NA, 6L), class = "data.frame")

Then I have this list of various information (DF2).

DF2 = structure(list(City = c("New York City", "Los Angeles", "Chicago", "Miami", "Dallas", "Atlanta"), `2005` = c("", "", "", 
"Mercedes, Mitsubishi", "Ford", ""), `2006` = c("", 
"", "", "Ford", "Audi", ""), `2007` = c("Toyota", 
"", "Toyota", "", "Fiat, Audi, Audi", ""
), `2008` = c("Fiat", "", "", "Mitsubishi, Merecedes, Fiat, Mitsubishi", 
"Audi, Fiat, Merecedes", ""), `2009` = c("Fiat", 
"", "", "Audi, Toyota", "Toyota, Audi, Fiat", 
""), `2010` = c("", "", "", "Toyota, Merecedes, Merecedes, Audi, Mitsubishi", 
"", ""), `2011` = c("", "", "", "", "Toyota", ""), `2012` = c("", 
"", "", "Merecedes, Ford, Merecedes, Toyota", "Toyota", 
"Fiat"), `2013` = c("Fiat", "", "Toyota", "", "", 
""), `2014` = c("", "", "Fiat, Mitsubishi", "", "Mitsubishi, Audi, Toyota, Merecedes, Toyota, Mitsubishi, Fiat, Mitsubishi, Fiat", 
""), `2015` = c("", "", "Toyota", "", "Toyota, Merecedes", 
""), `2016` = c("", "", "", "", "", ""), `Contact` = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `Time` = c("2011", "2015", "2015", "2006, 2006, 2005, 2005, 2007", 
"2014, 2011", "2007"), Cut = c("2011", "2015", "2015", "2005", 
"2011", "2007")), .Names = c("City", "2005", "2006", "2007", "2008", 
"2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", 
"Contact", "Time", "Cut"), row.names = c(NA, 
6L), class = "data.frame")

Columns 2 through 13 contain the name of the different cars. what I'd like R to do is to simply replace those names with the IDs from the "index" list above.

I have tried to use the replace function such as this:

replace(DF2, DF1$Car, DF2$ID)

But this doesn't seem to quite work. I'm open for other suggestions if replace is not the best solution.

Upvotes: 1

Answers (3)

Esther

Reputation: 1115

And here is a method in base using apply and gsub

DF2m <- as.matrix(DF2)

apply(DF1, 1, function(x) {
  DF2m <<- gsub(x[1], x[2], DF2m)
})

DF2new <- as.data.frame(as.matrix(DF2m, nrow=nrow(DF2)), stringsAsFactors = FALSE)

Upvotes: 0

Melissa Key

Reputation: 4551

Here's a method using the tidyverse suite of packages. gather and spread function similar to reshape in base. The actual replacement is done using the match function, but we need to split up the list of cars by ", " first using str_split, do the replacement, then paste it all back together.

DF2 %>%
  gather(year, cars, `2005`:`2016`) %>%
  mutate(year, cars_id = map_chr(str_split(cars, ", "), ~ if(length(.x > 0)) paste(unique(DF1$ID[match(.x, DF1$Car)]), collapse = ", ") else "")) %>%
  select(-cars) %>%
  spread(year, cars_id)

Upvotes: 5

s_baldur

Reputation: 33498

rep_carn_wid <- function(x, lo = DF1) {
  for (i in 1:nrow(lo)) {
    x <- gsub(lo[i, 1], lo[i, 2], x)
  }
  x
}

DF2[2:13] <- lapply(DF2[2:13],  rep_carn_wid)

Upvotes: 0

Index/matching based on two columns in R, is it possible?

Answers (3)

Related Questions