sbac
sbac

Reputation: 2081

Mutating multiple columns in a data frame using dplyr

I have the following data frame df:

  v1 v2 v3 v4
1  1  5  7  4
2  2  6 10  3

And I want to obtain the following data frame df2 multiplying columns v1*v3 and v2*v4:

  v1 v2 v3 v4 v1v3 v2v4
1  1  5  7  4    7   20
2  2  6 10  3   20   18

How can I do that using dplyr? Using mutate_each?

I need a solution that can be generalized to a large number of variables and not only 4 (v1 to v4). This is the code to generate the example:

v1 <- c(1, 2)
v2 <- c(5,6)
v3 <- c(7, 10)
v4 <- c(4, 3)
df <- data.frame(v1, v2, v3, v4)
v1v3 <- c(v1 * v3)
v2v4 <- c(v2 * v4)
df2 <- cbind(df, v1v3, v2v4)

Upvotes: 24

Views: 92844

Answers (5)

ginn
ginn

Reputation: 151

I just found out!!!

In my case, I did:

mutate (log2 (across (starts_with ("ratio"), .names = "log2_{.col}")))

So I would transform to logarithm only the columns whose names started with "ratio". The new columns will have the same name as the originals, but their names will be preceded by the "log2_" prefix.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389215

We can use base R instead of using any extra packages like dplyr or data.table

We can use mapply to vectorize the operation for multiple vectors at the same time

n <- ncol(df)/2
mapply(`*`, df[1:n], df[(n + 1):ncol(df)])

#     v1 v2
#[1,]  7 20
#[2,] 20 18

We can merge (cbind) this dataframe to your original one then.


If you are interested in tidyverse solution the equivalent in purrr would be variants of map2

purrr::map2_df(df[1:n], df[(n + 1):ncol(df)], `*`)

# A tibble: 2 x 2
#     v1    v2
#  <dbl> <dbl>
#1     7    20
#2    20    18

Upvotes: 4

leerssej
leerssej

Reputation: 14988

You are really close.

df2 <- 
    df %>% 
    mutate(v1v3 = v1 * v3,
           v2v4 = v2 * v4)

such a beautifully simple language, right?

For more great tricks please see here.

EDIT: Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.

# prep the product column names (also acting as row numbers)
df <- 
    df %>%
    mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2)) 

# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <- 
    df %>%
    gather(column, value, -prod_grp) %>% 
    mutate(column = as.numeric(sub("v", "", column)),
           pair = column - 2) %>% 
    mutate(pair = if_else(pair < 1, pair + 2, pair))

# summarize the products for each column
prod_df <- 
    tidy_df %>% 
    group_by(prod_grp, pair) %>% 
    summarize(val = prod(value)) %>% 
    spread(prod_grp, val) %>% 
    mutate(pair = paste0("v", pair, "v", pair + 2)) %>% 
    rename(prod_grp = pair)

# put the original frame and summary frames together
final_df <- 
    df %>% 
    left_join(prod_df) %>% 
    select(-prod_grp)

Upvotes: 36

sbac
sbac

Reputation: 2081

I think I found a solution:

df %>%
  mutate(n = df[1:(ncol(df)/2)] * df[(1+ncol(df)/2):(ncol(df))]) %>% head()

The result is valid for any number of variables. It only remains a problem with the name of the new variables. This is the result:

  v1 v2 v3 v4 n.v1 n.v2
1  1  5  7  4    7   20
2  2  6 10  3   20   18

Upvotes: 3

Morgan Ball
Morgan Ball

Reputation: 790

Just use mutate as is with a comma to separate new columns mutate(df,"v1v3"=v1*v3,"v2v4"= v2*v4)

Upvotes: 2

Related Questions