Reputation: 77
I am working with dplyr package of R. Let's say I have a data frame of names and ids
df <- data.frame(dID=c(1 ,2 , 1 ),
name=c("a","a","b"))
and I want to resolve each id from another database and get the information I need.
db <- data.frame(dID=c(1 ,2 ,3 ,4 ),
info1=c("A" ,"B" ,"C" ,"D" ),
info2=c("AA","BB","CC","DD"))
Currently, I am using the following code.
df %>% rowwise() %>%
mutate(INFO1 = (function(id){paste(db %>% filter(dID == id) %>% select(info1))})(dID),
INFO2 = (function(id){paste(db %>% filter(dID == id) %>% select(info2))})(dID))
I was wondering is it possible to find a solution to avoid repeating this part of the code
db %>% filter(dID == id)
by storing it in a temporary variable. For example when I, change my code to
df %>% rowwise() %>%
mutate(tmp <- db %>% filter(dID == dID),
INFO1 = paste(tmp %>% select(info1)),
INFO2 = paste(tmp %>% select(info2))
)
I get this error
Error in mutate_impl(.data, dots) : Column
tmp <- db %>% filter(dID == dID)
is of unsupported class data.frame
Is there any way to make the code tidier and faster?
Upvotes: 0
Views: 1387
Reputation: 50678
I agree with Marius' comment. To demonstrate, the following reproduces the result from your rowwise
dplyr
chain
left_join(df, db) %>% mutate_at(vars(starts_with("info")), ~as.numeric(as.factor(.x)))
# dID name info1 info2
#1 1 a 1 1
#2 2 a 2 2
#3 1 b 1 1
Upvotes: 1