Reputation: 896
I was looking for an easy way of doing it with R, but I couldn't find it, so I'm posting it here.
Let's assume that I have the following data frame
state1 score1 state2 score2
A 1 A 3
A 2 B 13
A 1 C 5
B 10 A 1
B 5 B 0
B 3 C 0
C 2 A 5
C 0 B 6
C 1 C 3
and the 2nd data frame is
state1 state2 score
A A 0
A B -1
A C 3
B A 2
B B 1
B C 1
C A 2
C B 2
C C 1
Let's call the first data frame, df1, and call the second margin, df2.
Look at the df1, df2 having the same (state1, state2) pairs. For each of those matching pair, subtract score in df2 from score1 in df1 and call it newscore1, and subtract score in df2 from score2 in df2 and call it newscore2. For this case, the following would be desired output.
state1 newscore1 state2 newscore2
A 1 A 3
A 3 B 14
A -2 C 2
B 8 A -1
B 4 B -1
B 2 C -1
C 0 A 3
C -2 B 4
C 0 C 2
Is there a one/two-liner solution to it? otherwise, I have to do
1) re-order df2 so that state1, state2 match with df1 (in this case, I don't have to do anything since row 1 in df1 already matches with row 1 in df2, row 2 in df1 already matches with row 2 in df2 and so on)
2) cbind the df1$score1-df2$score, df1$score2-df2$score
Upvotes: 0
Views: 4582
Reputation: 26248
a one-liner using library(data.table)
.
Do the join (as the other solutions have suggested), and then use the update-by-reference operator (:=
) to add the new column in the one step.
df1[ df2, on = c("state1","state2"), `:=`(newscore1 = score1 - score, newscore2 = score2 - score)]
df1
# state1 score1 state2 score2 newscore1 newscore2
# 1: A 1 A 3 1 3
# 2: A 2 B 13 3 14
# 3: A 1 C 5 -2 2
# 4: B 10 A 1 8 -1
# 5: B 5 B 0 4 -1
# 6: B 3 C 0 2 -1
# 7: C 2 A 5 0 3
# 8: C 0 B 6 -2 4
# 9: C 1 C 3 0 2
Upvotes: 3
Reputation: 107567
Simply merge the two and subtract column by column:
dfm <- merge(df1, df2, by=c("state1", "state2"))
dfm$newscore1 <- dfm$score1 - dfm$score
dfm$newscore2 <- dfm$score2 - dfm$score
dfm <- dfm[c("state1", "newscore1", "state2", "newscore2")]
Upvotes: 2
Reputation: 487
The cleanest way to do this will be with a join operation. I like dplyr
for this. For example:
state1 <- gl(3, k=3, labels=c("A", "B", "C"))
score1 <- sample(1:10, size = 9, replace = TRUE)
state2 <- gl(3, k=1, length=9, labels=c("A", "B", "C"))
score2 <- sample(1:10, size = 9, replace = TRUE)
df1 <- data.frame(state1, score1, state2, score2)
Here's that first dataframe:
> df1
state1 score1 state2 score2
1 A 3 A 6
2 A 8 B 2
3 A 3 C 6
4 B 2 A 8
5 B 3 B 10
6 B 3 C 6
7 C 7 A 2
8 C 9 B 5
9 C 6 C 10
score <- sample(-5:5, size = 9, replace = TRUE)
df2 <- data.frame(state1, state2, score)
And here's the second:
> df2
state1 state2 score
1 A A -1
2 A B 1
3 A C -2
4 B A 5
5 B B 5
6 B C 5
7 C A 0
8 C B -1
9 C C -3
combined_df <- df1 %>%
# line df1 and df2 up by state1 and state2, and combine them
full_join(df2, by=c("state1", "state2")) %>%
# calculate the new columns you need
mutate(newscore1 = score1 - score, newscore2 = score2 - score) %>%
# drop the extra columns
select(state1, newscore1, state2, newscore2)
> combined_df
state1 newscore1 state2 newscore2
1 A 4 A 7
2 A 7 B 1
3 A 5 C 8
4 B -3 A 3
5 B -2 B 5
6 B -2 C 1
7 C 7 A 2
8 C 10 B 6
9 C 9 C 13
Upvotes: 2