arielle
arielle

Reputation: 945

R: Filling a matrix with binary values depending on the content of other data frames

I have two data frames like the ones pasted below:

df1:

v   a
w   b
x   c
y   d
z   e

df2:

v   A
v   C
w   A
x   B
y   C
z   C
z   B

As you can see the values v-z link the two data frames.

I have set up a third data frame like this:

df3:

    A   B   C
a   NA  NA  NA
b   NA  NA  NA
c   NA  NA  NA
d   NA  NA  NA
e   NA  NA  NA

And I want to fill it with 0s and 1s, like this:

df3:

    A   B   C
a   1   0   1
b   1   0   0
c   0   1   0
d   0   0   1
e   0   1   1

Basically, if we take the first line of df1 as an example, we can see that "a" corresponds to "v". In df2, we can see that "v" corresponds to both "A" and "C", so in df3, in the "a" row, there will be a "1" in the "A" and "C" column.

Any ideas on how to approach this problem?

I thought about applying a function over df3, looking up the row and column names of each element and see if they are joined in df1 and df2, but I am not sure how to do that.

I hope I explained this clearly enough, please don't hesitate to tell me if it is not clear, I will try to improve my explanation. I also apologize for the clumsy title, I don't know how these manipulations are called.

Thank you!

Upvotes: 0

Views: 210

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193637

Essentially you want table, which is what the other answers have already arrived at, but they proposed approaching the problem with merge (or its "tidyverse" variant, full_join).

However, a more efficient approach would be to simply use match or factor and then table. Both of the following would also give you the output you're expecting.

# match -- probably slightly faster than factor
table(df1$V2[match(df2$V1, df1$V1)], df2$V2)

# factor -- a little bit easier to read
table(factor(df2$V1, df1$V1, df1$V2), df2$V2) 

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

One base R option is using merge. We merge the df1 and df2 by common column V1 and then find the frequency using table.

m <- merge(df1, df2, by = "V1", all.x = TRUE)
table(m$V2.x, m$V2.y)

#   A B C
# a 1 0 1
# b 1 0 0
# c 0 1 0
# d 0 0 1
# e 0 1 1

where m after merging comes out to be:

  V1 V2.x V2.y
1  v    a    A
2  v    a    C
3  w    b    A
4  x    c    B
5  y    d    C
6  z    e    C
7  z    e    B

Upvotes: 1

JasonWang
JasonWang

Reputation: 2434

Here is a way:

library(dplyr)
full_join(df1, df2, by=c("V1"="V1")) %>% 
  select(V2.x, V2.y) %>%
  table()

    V2.y
V2.x A B C
   a 1 0 1
   b 1 0 0
   c 0 1 0
   d 0 0 1
   e 0 1 1

Data

df1 <- structure(list(V1 = c("v", "w", "x", "y", "z"), V2 = c("a", "b", 
"c", "d", "e")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(V1 = c("v", "v", "w", "x", "y", "z", "z"), V2 = c("A", 
"C", "A", "B", "C", "C", "B")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 2

Related Questions