Reputation: 611
I have 2 dataframes with different information (there's only 1 column which is the same across both dataframes: subject
). Also, one of the dataframes is longer than the other (more columns and more rows).
So, I have something like this:
# drataframe 1
subject var1 var2
101 A B
102 C D
103 E F
...
# dataframe 2
subject varW varX varY varZ
101 1 2 1 4
101 2 1 1 4
101 4 1 1 4
102 2 1 2 5
102 1 1 2 5
102 2 4 2 5
103 2 3 3 1
103 1 2 3 1
103 4 1 3 1
Note that each subject is repeated several times in dataframe 2, whereas in dataframe 1 each subject appears only once.
What I want to do is to append columns varY
and varZ
to dataframe 1.
I have tried using select
(to select those two columns) and then inner_join
(to join both dataframes). However, I get 3 rows for each subject, because each subject is repeated 3 times in dataframe 2. I would like to have only 1 row for each subject in dataframe 1 once joined, since there is only one value in varY
and varZ
for each subject.
In other words, this would be my desired output:
subject var1 var2 varY varZ
101 A B 1 4
102 C D 2 5
103 E F 3 1
Upvotes: 2
Views: 138
Reputation: 31452
using library(data.table)
df1[df2[, .SD[1], .SDcols=c("varY", "varZ"), by=subject], on="subject"]
or alternatively (ty comment from @thelatemail),
df2[, .(subject, varY, varZ)][df1, on=.(subject), mult="first"]
The data:
df1 = fread("
subject var1 var2
101 A B
102 C D
103 E F
")
df2 = fread("
subject varW varX varY varZ
101 1 2 1 4
101 2 1 1 4
101 4 1 1 4
102 2 1 2 5
102 1 1 2 5
102 2 4 2 5
103 2 3 3 1
103 1 2 3 1
103 4 1 3 1
")
Upvotes: 3
Reputation: 21908
I hope this is what you want:
library(dplyr)
library(tidyr)
df2 %>%
left_join(df1, by = "subject") %>%
select(-c(varX, varW)) %>%
group_by(subject) %>%
slice_head(n = 1) %>%
relocate(subject, var1, var2)
# A tibble: 3 x 5
# Groups: subject [3]
subject var1 var2 varY varZ
<dbl> <chr> <chr> <dbl> <dbl>
1 101 A B 1 4
2 102 C D 2 5
3 103 E F 3 1
data
df2 <- tribble(
~subject, ~varW, ~varX, ~varY, ~varZ,
101, 1, 2, 1, 4,
101, 2, 1, 1, 4,
101, 4, 1, 1, 4,
102, 2, 1, 2, 5,
102, 1, 1, 2, 5,
102, 2, 4, 2, 5,
103, 2, 3, 3, 1,
103, 1, 2, 3, 1,
103, 4, 1, 3, 1
)
df1 <- tribble(
~subject, ~var1, ~var2,
101, "A", "B",
102, "C", "D",
103, "E", "F"
)
Upvotes: 1
Reputation: 887088
If the select
ed columns are having unique values, then use distinct
and do a join by 'subject'
library(dplyr)
distinct(df2, subject, varY, varZ) %>%
right_join(df1) %>%
select(names(df1), everything())
-output
# subject var1 var2 varY varZ
#1 101 A B 1 4
#2 102 C D 2 5
#3 103 E F 3 1
df1 <- structure(list(subject = 101:103, var1 = c("A", "C", "E"),
var2 = c("B",
"D", "F")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(subject = c(101L, 101L, 101L, 102L, 102L, 102L,
103L, 103L, 103L), varW = c(1L, 2L, 4L, 2L, 1L, 2L, 2L, 1L, 4L
), varX = c(2L, 1L, 1L, 1L, 1L, 4L, 3L, 2L, 1L), varY = c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), varZ = c(4L, 4L, 4L, 5L, 5L,
5L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -9L
))
Upvotes: 2