James
James

Reputation: 526

Substracting Two Dataframes - Retaining first Column containing names as characters

I haven't found an exact answer for what I'm trying to do. I often have two dataframes to substract, each ones containing a "names" column. Example:

df1 <- data.frame(name = c("name1","name2","name3","name4"),
                  month1 = c(5,6,7,8),
                  month2 = c(10,11,12,13),
                  month3 = c(15,16,17,18))

df2 <- data.frame(name = c("name1","name2","name3","name4"),
                  month1 = c(22,23,24,25),
                  month2 = c(31,34,35,39),
                  month3 = c(42,43,45,46))

What I would very simply like to do is have a df3, that is a substraction of df2 - df1, but retains the name columns:


df3 <- df1 %>%
  select("name")

temp <- df2[,-c(1)] - df1[,-c(1)]

df3 <- bind_cols(df3,temp) 

print(df3)

   name month1 month2 month3
1 name1     17     21     27
2 name2     17     23     27
3 name3     17     23     28
4 name4     17     26     28

Now, it's only three short lines of code. However, is there no "one liner" function that can substract the dataframes while specifying the retention of the "name" column. It would essentially do the same as df2[,-c(1)] - df1[,-c(1)], but immediately re-add the "name" column, rather than splitting the dataframe. Is that possible?

Upvotes: 0

Views: 47

Answers (2)

GuedesBF
GuedesBF

Reputation: 9878

You can use dplyr and purrr:

library(dplyr)
library(purrr)

map2_dfc(df2[-1], df1[-1], ~ .x - .y) %>% cbind(df2[1], .)

   name month1 month2 month3
1 name1     17     21     27
2 name2     17     23     27
3 name3     17     23     28
4 name4     17     26     28

You can also wrap that inside a custom function:

subtract_dfs<-function(df_1, df_2){
    purrr::map2_dfc(df_1[-1], df_2[-1], ~ .x - .y) %>% cbind(df_1[1], .)
    }

EDIT

There is not need for a mapping function here, as the data frames can be subtracted all at once:

cbind(df2[1], df2[-1] - df1[-1])

Upvotes: 1

Till
Till

Reputation: 6663

Your solution is already close to a one liner. By writing it differently you can make it a one-liner like this:

bind_cols(name = df1$name, df2[,-1] - df1[,-1])                  

But I don't think that is an actual improvement, as you are losing some of the readability your original solution has.

You are saying that you do this frequently. It might be a good idea to write a function for this yourself that you can then re-use.

subtract_dfs <- 
  function(df1, df2, name = "name") {
    bind_cols(name = df1[name], df2[,-1] - df1[,-1])                  
  }

Now you can do:

subtract_dfs(df1, df2)

This allows for the name variable to have custom values. The function could be further improved. For example it could be extended to give correct results even if not all values for name are present in both data frames.

Upvotes: 2

Related Questions