Reputation: 906

Taking the difference between two data frames in R

I was looking for an easy way of doing it with R, but I couldn't find it, so I'm posting it here.

Let's assume that I have the following data frame

 state1     score1     state2    score2
   A          1          A          3
   A          2          B          13
   A          1          C          5
   B          10         A          1
   B          5          B          0
   B          3          C          0
   C          2          A          5
   C          0          B          6
   C          1          C          3

and the 2nd data frame is

 state1     state2    score
   A          A         0
   A          B         -1
   A          C         3
   B          A         2
   B          B         1
   B          C         1
   C          A         2
   C          B         2
   C          C         1

Let's call the first data frame, df1, and call the second margin, df2.

Look at the df1, df2 having the same (state1, state2) pairs. For each of those matching pair, subtract score in df2 from score1 in df1 and call it newscore1, and subtract score in df2 from score2 in df2 and call it newscore2. For this case, the following would be desired output.

 state1   newscore1    state2    newscore2
   A          1          A          3
   A          3          B          14
   A          -2         C          2
   B          8          A          -1
   B          4          B          -1
   B          2          C          -1
   C          0          A          3
   C          -2         B          4
   C          0          C          2

Is there a one/two-liner solution to it? otherwise, I have to do

1) re-order df2 so that state1, state2 match with df1 (in this case, I don't have to do anything since row 1 in df1 already matches with row 1 in df2, row 2 in df1 already matches with row 2 in df2 and so on)

2) cbind the df1$score1-df2$score, df1$score2-df2$score

Upvotes: 0

Answers (3)

SymbolixAU

Reputation: 26258

a one-liner using library(data.table).

Do the join (as the other solutions have suggested), and then use the update-by-reference operator (:=) to add the new column in the one step.

df1[ df2, on = c("state1","state2"), `:=`(newscore1 = score1 - score, newscore2 = score2 - score)]
df1
#    state1 score1 state2 score2 newscore1 newscore2
# 1:      A      1      A      3         1         3
# 2:      A      2      B     13         3        14
# 3:      A      1      C      5        -2         2
# 4:      B     10      A      1         8        -1
# 5:      B      5      B      0         4        -1
# 6:      B      3      C      0         2        -1
# 7:      C      2      A      5         0         3
# 8:      C      0      B      6        -2         4
# 9:      C      1      C      3         0         2

Upvotes: 3

Parfait

Reputation: 107767

Simply merge the two and subtract column by column:

dfm <- merge(df1, df2, by=c("state1", "state2"))
dfm$newscore1 <- dfm$score1 - dfm$score
dfm$newscore2 <- dfm$score2 - dfm$score

dfm <- dfm[c("state1", "newscore1", "state2", "newscore2")]

Upvotes: 2

Rose Hartman

Reputation: 487

The cleanest way to do this will be with a join operation. I like dplyr for this. For example:

state1 <- gl(3, k=3, labels=c("A", "B", "C"))
score1 <- sample(1:10, size = 9, replace = TRUE)
state2 <- gl(3, k=1, length=9, labels=c("A", "B", "C"))
score2 <- sample(1:10, size = 9, replace = TRUE)
df1 <- data.frame(state1, score1, state2, score2)

Here's that first dataframe:

> df1
  state1 score1 state2 score2
1      A      3      A      6
2      A      8      B      2
3      A      3      C      6
4      B      2      A      8
5      B      3      B     10
6      B      3      C      6
7      C      7      A      2
8      C      9      B      5
9      C      6      C     10

score <- sample(-5:5, size = 9, replace = TRUE)
df2 <- data.frame(state1, state2, score)

And here's the second:

> df2
  state1 state2 score
1      A      A    -1
2      A      B     1
3      A      C    -2
4      B      A     5
5      B      B     5
6      B      C     5
7      C      A     0
8      C      B    -1
9      C      C    -3

combined_df <- df1 %>% 
  # line df1 and df2 up by state1 and state2, and combine them
  full_join(df2, by=c("state1", "state2")) %>% 
  # calculate the new columns you need
  mutate(newscore1 = score1 - score, newscore2 = score2 - score) %>% 
  # drop the extra columns
  select(state1, newscore1, state2, newscore2)

> combined_df
  state1 newscore1 state2 newscore2
1      A         4      A         7
2      A         7      B         1
3      A         5      C         8
4      B        -3      A         3
5      B        -2      B         5
6      B        -2      C         1
7      C         7      A         2
8      C        10      B         6
9      C         9      C        13

Upvotes: 2

Taking the difference between two data frames in R

Answers (3)

Related Questions