Reputation: 945
I have two data frames like the ones pasted below:
df1:
v a
w b
x c
y d
z e
df2:
v A
v C
w A
x B
y C
z C
z B
As you can see the values v-z link the two data frames.
I have set up a third data frame like this:
df3:
A B C
a NA NA NA
b NA NA NA
c NA NA NA
d NA NA NA
e NA NA NA
And I want to fill it with 0s and 1s, like this:
df3:
A B C
a 1 0 1
b 1 0 0
c 0 1 0
d 0 0 1
e 0 1 1
Basically, if we take the first line of df1 as an example, we can see that "a" corresponds to "v". In df2, we can see that "v" corresponds to both "A" and "C", so in df3, in the "a" row, there will be a "1" in the "A" and "C" column.
Any ideas on how to approach this problem?
I thought about applying a function over df3, looking up the row and column names of each element and see if they are joined in df1 and df2, but I am not sure how to do that.
I hope I explained this clearly enough, please don't hesitate to tell me if it is not clear, I will try to improve my explanation. I also apologize for the clumsy title, I don't know how these manipulations are called.
Thank you!
Upvotes: 0
Views: 210
Reputation: 193637
Essentially you want table
, which is what the other answers have already arrived at, but they proposed approaching the problem with merge
(or its "tidyverse" variant, full_join
).
However, a more efficient approach would be to simply use match
or factor
and then table
. Both of the following would also give you the output you're expecting.
# match -- probably slightly faster than factor
table(df1$V2[match(df2$V1, df1$V1)], df2$V2)
# factor -- a little bit easier to read
table(factor(df2$V1, df1$V1, df1$V2), df2$V2)
Upvotes: 1
Reputation: 389135
One base R option is using merge
. We merge
the df1
and df2
by common column V1
and then find the frequency using table
.
m <- merge(df1, df2, by = "V1", all.x = TRUE)
table(m$V2.x, m$V2.y)
# A B C
# a 1 0 1
# b 1 0 0
# c 0 1 0
# d 0 0 1
# e 0 1 1
where m
after merging comes out to be:
V1 V2.x V2.y
1 v a A
2 v a C
3 w b A
4 x c B
5 y d C
6 z e C
7 z e B
Upvotes: 1
Reputation: 2434
Here is a way:
library(dplyr)
full_join(df1, df2, by=c("V1"="V1")) %>%
select(V2.x, V2.y) %>%
table()
V2.y
V2.x A B C
a 1 0 1
b 1 0 0
c 0 1 0
d 0 0 1
e 0 1 1
df1 <- structure(list(V1 = c("v", "w", "x", "y", "z"), V2 = c("a", "b",
"c", "d", "e")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(V1 = c("v", "v", "w", "x", "y", "z", "z"), V2 = c("A",
"C", "A", "B", "C", "C", "B")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2