Reputation: 2081
I have the following data frame df
:
v1 v2 v3 v4
1 1 5 7 4
2 2 6 10 3
And I want to obtain the following data frame df2
multiplying columns v1*v3 and v2*v4:
v1 v2 v3 v4 v1v3 v2v4
1 1 5 7 4 7 20
2 2 6 10 3 20 18
How can I do that using dplyr
? Using mutate_each
?
I need a solution that can be generalized to a large number of variables and not only 4 (v1 to v4). This is the code to generate the example:
v1 <- c(1, 2)
v2 <- c(5,6)
v3 <- c(7, 10)
v4 <- c(4, 3)
df <- data.frame(v1, v2, v3, v4)
v1v3 <- c(v1 * v3)
v2v4 <- c(v2 * v4)
df2 <- cbind(df, v1v3, v2v4)
Upvotes: 24
Views: 92844
Reputation: 151
I just found out!!!
In my case, I did:
mutate (log2 (across (starts_with ("ratio"), .names = "log2_{.col}")))
So I would transform to logarithm only the columns whose names started with "ratio". The new columns will have the same name as the originals, but their names will be preceded by the "log2_" prefix.
Upvotes: 3
Reputation: 389215
We can use base R
instead of using any extra packages like dplyr
or data.table
We can use mapply
to vectorize the operation for multiple vectors at the same time
n <- ncol(df)/2
mapply(`*`, df[1:n], df[(n + 1):ncol(df)])
# v1 v2
#[1,] 7 20
#[2,] 20 18
We can merge (cbind
) this dataframe to your original one then.
If you are interested in tidyverse
solution the equivalent in purrr
would be variants of map2
purrr::map2_df(df[1:n], df[(n + 1):ncol(df)], `*`)
# A tibble: 2 x 2
# v1 v2
# <dbl> <dbl>
#1 7 20
#2 20 18
Upvotes: 4
Reputation: 14988
You are really close.
df2 <-
df %>%
mutate(v1v3 = v1 * v3,
v2v4 = v2 * v4)
such a beautifully simple language, right?
For more great tricks please see here.
EDIT: Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.
# prep the product column names (also acting as row numbers)
df <-
df %>%
mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2))
# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <-
df %>%
gather(column, value, -prod_grp) %>%
mutate(column = as.numeric(sub("v", "", column)),
pair = column - 2) %>%
mutate(pair = if_else(pair < 1, pair + 2, pair))
# summarize the products for each column
prod_df <-
tidy_df %>%
group_by(prod_grp, pair) %>%
summarize(val = prod(value)) %>%
spread(prod_grp, val) %>%
mutate(pair = paste0("v", pair, "v", pair + 2)) %>%
rename(prod_grp = pair)
# put the original frame and summary frames together
final_df <-
df %>%
left_join(prod_df) %>%
select(-prod_grp)
Upvotes: 36
Reputation: 2081
I think I found a solution:
df %>%
mutate(n = df[1:(ncol(df)/2)] * df[(1+ncol(df)/2):(ncol(df))]) %>% head()
The result is valid for any number of variables. It only remains a problem with the name of the new variables. This is the result:
v1 v2 v3 v4 n.v1 n.v2
1 1 5 7 4 7 20
2 2 6 10 3 20 18
Upvotes: 3
Reputation: 790
Just use mutate as is with a comma to separate new columns mutate(df,"v1v3"=v1*v3,"v2v4"= v2*v4)
Upvotes: 2