Reputation: 11687
I'm having a hard time understanding how to do the equivalent of a look-up table in R. I've seen people suggest you should use "merges" in lieu of look-up tables, but I'm not sure what the right approach is:
Let's say I have the following:
set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(100, 10, '[A-Z]'),
person_2_id = stringi::stri_rand_strings(100, 10, '[A-Z]'))
team_id_lookup <- data.frame(person_id = stringi::stri_rand_strings(100, 10, '[A-Z]'),
team_ids = floor(runif(100, min=0, max=500)))
I want to create two new columns in person_ids
-- team_id_1
and team_id_2
, which use the look up dataframe to find the corresponding team_ids for a given person_id and taking that value.
What is the right approach here?
Upvotes: 2
Views: 209
Reputation: 11514
Just to point a base R solution out here:
The idea of a look-up table is that there is a table where you can easily look up information based on the index. The index in R would be e.g. the row name of a table, which you can assign for instance with:
rownames(team_id_lookup) <- team_id_lookup$person_id
You can then use this to lookup your team memberships:
person_ids$team1 <- team_id_lookup[person_ids$person_1_id,"team_ids"]
person_ids$team2 <- team_id_lookup[person_ids$person_2_id,"team_ids"]
person_ids
person_1_id person_2_id team1 team2
1 X L 257 NA
2 Y S 223 NA
3 H Y NA 223
4 V G NA NA
5 Q M NA NA
6 N Y NA 223
7 T Z NA 452
8 D D 195 195
9 R M NA NA
10 S O NA NA
For reproducability: I used the same data as @AntoniosK but set stringsAsFactors to FALSE, as per Antonios comment.
set.seed(42)
person_ids <- data.frame(
person_1_id = stringi::stri_rand_strings(10, 1, '[A-Z]'),
person_2_id = stringi::stri_rand_strings(10, 1, '[A-Z]'),
stringsAsFactors = F)
team_id_lookup <- data.frame(
person_id = stringi::stri_rand_strings(5, 1, '[A-Z]'),
team_ids = floor(runif(5, min=0, max=500)),
stringsAsFactors = F)
Upvotes: 2
Reputation: 6755
I'm not completely following the logic of having only 100 team values so I'm doing 200. But this is another approach.
set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(100, 10, '[A-Z]'),
person_2_id = stringi::stri_rand_strings(100, 10, '[A-Z]'),
stringsAsFactors = FALSE)
all_pid <- c(person_ids$person_1_id, person_ids$person_2_id)
team_ids <- floor(runif(200, min=0, max=500))
names(team_ids) <- all_pid
person_ids$team_id_1 <- team_ids[person_ids$person_1_id]
person_ids$team_id_2 <- team_ids[person_ids$person_2_id]
head(person_ids)
person_1_id person_2_id team_id_1 team_id_2
1 XYHVQNTDRS WBVOMAOSGV 128 207
2 LSYGMYZDMO KODFEVCGUH 362 422
3 XDZYCNKXLV WBIIDJMBZX 78 428
4 TVKRAVAFXP ZEBDJOYQMC 157 225
5 JLAZLYXQZQ YJJQSFPVZA 148 366
6 IJKUBTREGN VFCRHVQNAH 339 337
Upvotes: 2
Reputation: 16121
set.seed(42)
person_ids <- data.frame(person_1_id = stringi::stri_rand_strings(10, 1, '[A-Z]'),
person_2_id = stringi::stri_rand_strings(10, 1, '[A-Z]'))
team_id_lookup <- data.frame(person_id = stringi::stri_rand_strings(5, 1, '[A-Z]'),
team_ids = floor(runif(5, min=0, max=500)))
library(dplyr)
person_ids %>%
left_join(team_id_lookup, by=c("person_1_id"="person_id")) %>%
left_join(team_id_lookup, by=c("person_2_id"="person_id")) %>%
rename(team_id_1 = team_ids.x,
team_id_2 = team_ids.y)
# person_1_id person_2_id team_id_1 team_id_2
# 1 X L 257 NA
# 2 Y S 223 NA
# 3 H Y NA 223
# 4 V G NA NA
# 5 Q M NA NA
# 6 N Y NA 223
# 7 T Z NA 452
# 8 D D 195 195
# 9 R M NA NA
# 10 S O NA NA
Only people found in your look up table will have a match. All other will have NA
.
Upvotes: 4