Lucinho91
Lucinho91

Reputation: 195

How to subtract data frame column from another data frame column if condition is met?

I have two simple data frames containing both the columns "word" and "n" for how often a certain word occurred. Here is an example:

df1 <- data.frame(word=c("beautiful","nice","like","good"),n=c(400,378,29,10))
df2 <- data.frame(word=c("beautiful","nice","like","good","wonderful","awesome","sad","happy"),n=c(6000,20,5,150,300,26,17,195))

Besides the words of df1, df2 contains much more words so df1 is only a small subset of df2.

I found the words, that are contained in both, df1 and df2. Now I would like to subtract the word countings of df1 from df2 if the specific word is contained in df2 , meaning I would like to do the following:

I hope that my problem is clear.

I already found all the words from df1 that are also contained in df2

df1 %>% filter(df1$word %in% df2$word)

However, I am struggling with the subtracting command based on the condition that the words in df1 must be also in df2 and then only subtract df2$n - df1$n

Thank you for your help!

Upvotes: 2

Views: 2341

Answers (4)

IRTFM
IRTFM

Reputation: 263311

Here's a vectorized base solution where Boolean multiplication is used to replace an if-then construct used in the for-lop from @Rob:

 df2$n.adjusted <- df2$n - (df2$word %in% df1$word)* # zero if no match
                                 df1$n[ match(df1$word, df2$word) ] # gets order correct
> df2
       word    n n.adjusted
1 beautiful 6000       5600
2      nice   20       -358
3      like    5        -24
4      good  150        140
5 wonderful  300        300
6   awesome   26         26
7       sad   17         17
8     happy  195        195

Here's the example I used to test where the order of the df1 words was not the same as the order in df2 and the lengths were not an even multiple:

> df1 <-data.frame(word=c("nice","beautiful","like","good"),n=c(378,400,29,10))
> df2 <- data.frame(word=c("beautiful","nice","like","good","wonderful","awesome","sad"),n=c(6000,20,5,150,300,26,17))
> 
>  df1
       word   n
1      nice 378
2 beautiful 400
3      like  29
4      good  10
>  df2
       word    n
1 beautiful 6000
2      nice   20
3      like    5
4      good  150
5 wonderful  300
6   awesome   26
7       sad   17
> df2$n.adjusted <- df2$n - (df2$word %in% df1$word)*df1$n[match(df1$word, df2$word)]
Warning message:
In (df2$word %in% df1$word) * df1$n[match(df1$word, df2$word)] :
  longer object length is not a multiple of shorter object length
> df2
       word    n n.adjusted
1 beautiful 6000       5600
2      nice   20       -358
3      like    5        -24
4      good  150        140
5 wonderful  300        300
6   awesome   26         26
7       sad   17         17

Upvotes: 2

Rob
Rob

Reputation: 277

Here is a quick solution using a for loop and the %in% operator.

df2$diff <- NA
for (i in 1:nrow(df2)) {
  if (df2$word[i] %in%  df1$word[i]) {
    df2$diff[i] <- df2$n[i] - df1$n[i]
  }
}
df2

Output:

> df2
       word    n diff
1 beautiful 6000 5600
2      nice   20 -358
3      like    5  -24
4      good  150  140
5 wonderful  300   NA
6   awesome   26   NA
7       sad   17   NA
8     happy  195   NA

Upvotes: 2

IceCreamToucan
IceCreamToucan

Reputation: 28675

require(dplyr)
 df1 %>% 
  inner_join(df2, by = 'word') %>% 
  mutate(diff = n.y - n.x) %>% 
  select(word, diff)

Gives

       word diff
1 beautiful 5600
2      nice -358
3      like  -24
4      good  140

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61154

Using merge:

> df.tmp <- merge(df1, df2, by="word", all=TRUE)
> df.tmp$result <- df.tmp$n.y - df.tmp$n.x
> df.tmp
       word n.x  n.y result
1 beautiful 400 6000   5600
2      good  10  150    140
3      like  29    5    -24
4      nice 378   20   -358
5   awesome  NA   26     NA
6     happy  NA  195     NA
7       sad  NA   17     NA
8 wonderful  NA  300     NA

If you only want matched words

> df.tmp <- merge(df1, df2, by="word")
> df.tmp$result <- df.tmp$n.y - df.tmp$n.x
> df.tmp
       word n.x  n.y result
1 beautiful 400 6000   5600
2      good  10  150    140
3      like  29    5    -24
4      nice 378   20   -358

Upvotes: 4

Related Questions