Reputation: 11687

Use look-up table in R to create new column based on another df

I'm having a hard time understanding how to do the equivalent of a look-up table in R. I've seen people suggest you should use "merges" in lieu of look-up tables, but I'm not sure what the right approach is:

Let's say I have the following:

set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(100, 10, '[A-Z]'), 
                 person_2_id = stringi::stri_rand_strings(100, 10, '[A-Z]'))

team_id_lookup <- data.frame(person_id = stringi::stri_rand_strings(100, 10, '[A-Z]'), 
                  team_ids = floor(runif(100, min=0, max=500)))

I want to create two new columns in person_ids -- team_id_1 and team_id_2, which use the look up dataframe to find the corresponding team_ids for a given person_id and taking that value.

What is the right approach here?

Upvotes: 2

Answers (3)

coffeinjunky

Reputation: 11514

Just to point a base R solution out here:

The idea of a look-up table is that there is a table where you can easily look up information based on the index. The index in R would be e.g. the row name of a table, which you can assign for instance with:

rownames(team_id_lookup) <- team_id_lookup$person_id

You can then use this to lookup your team memberships:

person_ids$team1 <- team_id_lookup[person_ids$person_1_id,"team_ids"]
person_ids$team2 <- team_id_lookup[person_ids$person_2_id,"team_ids"]
person_ids
   person_1_id person_2_id team1 team2
1            X           L   257    NA
2            Y           S   223    NA
3            H           Y    NA   223
4            V           G    NA    NA
5            Q           M    NA    NA
6            N           Y    NA   223
7            T           Z    NA   452
8            D           D   195   195
9            R           M    NA    NA
10           S           O    NA    NA

For reproducability: I used the same data as @AntoniosK but set stringsAsFactors to FALSE, as per Antonios comment.

set.seed(42)
person_ids <- data.frame(
        person_1_id = stringi::stri_rand_strings(10, 1, '[A-Z]'), 
        person_2_id = stringi::stri_rand_strings(10, 1, '[A-Z]'), 
        stringsAsFactors = F)
team_id_lookup <- data.frame(
        person_id = stringi::stri_rand_strings(5, 1, '[A-Z]'), 
        team_ids = floor(runif(5, min=0, max=500)), 
        stringsAsFactors = F)

Upvotes: 2

Elin

Reputation: 6755

I'm not completely following the logic of having only 100 team values so I'm doing 200. But this is another approach.

set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(100, 10, '[A-Z]'), 
                         person_2_id = stringi::stri_rand_strings(100, 10, '[A-Z]'), 
                                        stringsAsFactors = FALSE)

all_pid <- c(person_ids$person_1_id, person_ids$person_2_id)

team_ids <- floor(runif(200, min=0, max=500))
names(team_ids) <- all_pid

person_ids$team_id_1 <- team_ids[person_ids$person_1_id]
person_ids$team_id_2 <- team_ids[person_ids$person_2_id]
head(person_ids)

  person_1_id person_2_id team_id_1 team_id_2
1  XYHVQNTDRS  WBVOMAOSGV       128       207
2  LSYGMYZDMO  KODFEVCGUH       362       422
3  XDZYCNKXLV  WBIIDJMBZX        78       428
4  TVKRAVAFXP  ZEBDJOYQMC       157       225
5  JLAZLYXQZQ  YJJQSFPVZA       148       366
6  IJKUBTREGN  VFCRHVQNAH       339       337

Upvotes: 2

AntoniosK

Reputation: 16121

set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(10, 1, '[A-Z]'), 
                         person_2_id = stringi::stri_rand_strings(10, 1, '[A-Z]'))

team_id_lookup <- data.frame(person_id = stringi::stri_rand_strings(5, 1, '[A-Z]'), 
                             team_ids = floor(runif(5, min=0, max=500)))

library(dplyr)

person_ids %>%
  left_join(team_id_lookup, by=c("person_1_id"="person_id")) %>%
  left_join(team_id_lookup, by=c("person_2_id"="person_id")) %>%
  rename(team_id_1 = team_ids.x,
         team_id_2 = team_ids.y)

#    person_1_id person_2_id team_id_1 team_id_2
# 1            X           L       257        NA
# 2            Y           S       223        NA
# 3            H           Y        NA       223
# 4            V           G        NA        NA
# 5            Q           M        NA        NA
# 6            N           Y        NA       223
# 7            T           Z        NA       452
# 8            D           D       195       195
# 9            R           M        NA        NA
# 10           S           O        NA        NA

Only people found in your look up table will have a match. All other will have NA.

Upvotes: 4

Use look-up table in R to create new column based on another df

Answers (3)

Related Questions