Reputation: 3663
I have a dataframe called df
that looks like this
GENDER,CHOCOLATE,VANILLA,...A_BUNCH_OF_COLUMNS_HERE...,ORANGUTAN
M,11,22,...,33
F,22,33,...,44
M,22,11,...,22
F,44,55,...,66
I want to make a new column called PERCENTAGE_ORANGUTAN
which would be calculated as ORANGUTAN / SUM(previous_columns)
I know how to make a calculated column withdplyr
if I know the names of the columns before ORANGUTAN
. But if I know not the names or number of columns before ORANGUTAN
, how do I calculate PERCENTAGE_ORANGUTAN
? Or without dplyr
?
Upvotes: 1
Views: 165
Reputation: 24945
In base:
First we will remove all columns that are not numeric:
df1 <- df[,sapply(df, is.numeric)]
Then subset by those that have an index greater than 'ORANGUTAN', and divide:
df$new <- df1$ORANGUTAN / rowSums(df1[ , 1:which(names(df1) == 'ORANGUTAN')])
giving:
[1] 0.5000000 0.4444444 0.4000000 0.4000000
data:
df = read.table(text = "GENDER,CHOCOLATE,VANILLA,ORANGUTAN
M,11,22,33
F,22,33,44
M,22,11,22
F,44,55,66", sep = ',', header = TRUE)
Upvotes: 0
Reputation: 93761
There may be a more "dplyr-ish" idiom, but if you know you want all the columns before the ORANGUTAN
column, you could use match
to select them. For example, using the built-in mtcars
data frame:
mtcars %>%
mutate(pct = wt/rowSums(.[ , 1:(match("wt", names(.))-1)]))
The above code creates a new column called pct
by dividing the wt
column by the sum of all the columns preceding it (which is non-sensical and just for illustration).
.
is a "pronoun" that refers to the mtcars
data frame.
We want the sum of columns 1 through the column before wt
. match
returns the numerical index of wt
in the vector of column names of mtcars
. That value happens to be 6, so we subtract 1 from it to get the sum of columns 1 through 5.
Upvotes: 4