Username
Username

Reputation: 3663

When making a calculated column, how do I sum across columns with dplyr?

I have a dataframe called df that looks like this

GENDER,CHOCOLATE,VANILLA,...A_BUNCH_OF_COLUMNS_HERE...,ORANGUTAN
M,11,22,...,33
F,22,33,...,44
M,22,11,...,22
F,44,55,...,66

I want to make a new column called PERCENTAGE_ORANGUTAN which would be calculated as ORANGUTAN / SUM(previous_columns)

I know how to make a calculated column withdplyr if I know the names of the columns before ORANGUTAN. But if I know not the names or number of columns before ORANGUTAN, how do I calculate PERCENTAGE_ORANGUTAN? Or without dplyr?

Upvotes: 1

Views: 165

Answers (2)

jeremycg
jeremycg

Reputation: 24945

In base:

First we will remove all columns that are not numeric:

df1 <- df[,sapply(df, is.numeric)]

Then subset by those that have an index greater than 'ORANGUTAN', and divide:

df$new <- df1$ORANGUTAN / rowSums(df1[ , 1:which(names(df1) == 'ORANGUTAN')])

giving:

[1] 0.5000000 0.4444444 0.4000000 0.4000000

data:

df = read.table(text = "GENDER,CHOCOLATE,VANILLA,ORANGUTAN
M,11,22,33
F,22,33,44
M,22,11,22
F,44,55,66", sep = ',', header = TRUE)

Upvotes: 0

eipi10
eipi10

Reputation: 93761

There may be a more "dplyr-ish" idiom, but if you know you want all the columns before the ORANGUTAN column, you could use match to select them. For example, using the built-in mtcars data frame:

mtcars %>% 
   mutate(pct = wt/rowSums(.[ , 1:(match("wt", names(.))-1)]))

The above code creates a new column called pct by dividing the wt column by the sum of all the columns preceding it (which is non-sensical and just for illustration).

. is a "pronoun" that refers to the mtcars data frame.

We want the sum of columns 1 through the column before wt. match returns the numerical index of wt in the vector of column names of mtcars. That value happens to be 6, so we subtract 1 from it to get the sum of columns 1 through 5.

Upvotes: 4

Related Questions