Reputation: 89
I need to scale values only within certain categories. Essentially I have a data.frame with individuals, years, and several predictors.
Individual | Year | Uplift |
---|---|---|
A | 2013 | 0.76280999 |
A | 2013 | 1.01930776 |
A | 2015 | 0.00000000 |
B | 2011 | 0.78427964 |
B | 2013 | 0.00000000 |
B | 2013 | 1.37627043 |
I need to scale my predictors within each individual and year, in other words standardize Individual A year 2013, Individual A year 2015 and so on for 33 individuals and 86 thousand rows. Not wishing to do this in separate data frames for each individual and year combination, I tried to use a dplyr solution
library("dplyr")
data %>%
group_by(Individual, Year) %>%
mutate(data, std_uplift= scale(uplift) %>%
ungroup()))
Naturally, this throws an error:
Error: Problem with
mutate()
input..1
.
x Input
..1
can't be recycled to size 1100.
i Input
..1
isdata
.
i Input
..1
must be size 1100 or 1, not 83670.
i The error occurred in group 1: Individual = "A", year = "2013".
I don't understand how to fix the error, as it seems to be trying to shove data from all individuals into a single group, but I am guessing that there is a better way to scale data given categories. How can I make this work?
Thanks!
Upvotes: 0
Views: 134
Reputation: 79184
library("dplyr")
df <- tribble(
~Individual, ~Year, ~Uplift,
"A", 2013, 0.76280999,
"A", 2013, 1.01930776,
"A", 2015, 0.00000000,
"B", 2011, 0.78427964,
"B", 2013, 0.00000000,
"B", 2013, 1.37627043)
df %>%
mutate(std_Uplift = as.numeric(scale(Uplift))) %>%
ungroup()
# A tibble: 6 x 4
Individual Year Uplift std_Uplift[,1]
<chr> <dbl> <dbl> <dbl>
1 A 2013 0.763 0.190
2 A 2013 1.02 0.653
3 A 2015 0 -1.18
4 B 2011 0.784 0.229
5 B 2013 0 -1.18
6 B 2013 1.38 1.30
Upvotes: 1