Reputation: 4408
Background
I have a dataset, df, where I would like to aggregate multiple columns and create a new column. I need to multiply Type, Span and Population columns and create a new Output column
ID Status Type Span State Population
A Yes 2 70% Ga 10000
Desired output
ID Status Type Span State Population Output
A Yes 2 70% Ga 10000 14000
dput
structure(list(ID = structure(1L, .Label = "A ", class = "factor"),
Status = structure(1L, .Label = "Yes", class = "factor"),
Type = 2L, Span = structure(1L, .Label = "70%", class = "factor"),
State = structure(1L, .Label = "Ga", class = "factor"), Population = 10000L), class = "data.frame",
row.names = c(NA,
-1L))
This is what I have tried
df %>%
mutate(Output = Type * Span * Population)
Upvotes: 1
Views: 132
Reputation: 389325
We can remove the '%'
sign using sub
, convert to numeric and multiply values.
This can be done in base R as :
df$output <- with(df, Type * as.numeric(sub('%', '', Span)) * Population/100)
df
# ID Status Type Span State Population output
#1 A Yes 2 70% Ga 10000 14000
Upvotes: 2
Reputation: 887971
Here, we are creating a new column based on the inputs from different column. We can just use mutate
to get the Span
percent of Population
and multiply by 'Type'. Note that 'Span' is not numeric, as it is having %
, so we extract the numeric part with parse_number
divide by 100, then multiply with Population along with the 'Type'
library(dplyr)
df %>%
mutate(Output = Type * Population * readr::parse_number(as.character(Span))/100)
# ID Status Type Span State Population Output
#1 A Yes 2 70% Ga 10000 14000
If the columns 'Type', 'Population' are not numeric, it is better to convert to numeric
with as.numeric(as.character(df$Type))
and for 'Population' (assuming they are factor
class). Another option is type.convert(df, as.is = TRUE)
and then work on that modified class dataset
Upvotes: 2