Reputation: 151
I want to group my dataframe by year and standardize certain columns (In this case BioTest, MathExam, and WritingScore) and replace the old data with the new data.Below is an example of my dataframe:
DF:
Var1 Var2 Year BioTest MathExam WritingScore Var3 Var 4
X X 2016 165 140 10 X X
X X 2017 172 128 11 X X
X X 2018 169 115 8 X X
X X 2016 166 139 10 X X
X X 2017 165 140 12 X X
I have tried variations of the following code:
DF<- DF %>% group_by(Year)%>% mutate(across(BioTest:WritingScore),scale)
DF<- DF %>% group_by(Year)%>% mutate(across(select(BioTest:WritingScore)),scale)
What I get in return is the same DF without any changes. What I want is:
DF:
Var1 Var2 Year BioTest MathExam WritingScore Var3 Var 4
X X 2016 NewData NewData NewData X X
X X 2017 NewData NewData NewData X X
X X 2018 NewData NewData NewData X X
X X 2016 NewData NewData NewData X X
X X 2017 NewData NewData NewData X X
Any help is much appreciated.
Upvotes: 2
Views: 342
Reputation: 39595
Maybe try this. THe issue is on your across()
statement. The function must be inside on it:
library(dplyr)
#Code
DF %>%
group_by(Year) %>%
mutate(across(BioTest:WritingScore,~scale(.)[,1]))
Output:
# A tibble: 5 x 9
# Groups: Year [3]
Var1 Var2 Year BioTest[,1] MathExam[,1] WritingScore[,1] Var3 Var X4
<chr> <chr> <int> <dbl> <dbl> <dbl> <chr> <chr> <lgl>
1 X X 2016 -0.707 0.707 NaN X X NA
2 X X 2017 0.707 -0.707 -0.707 X X NA
3 X X 2018 NaN NaN NaN X X NA
4 X X 2016 0.707 -0.707 NaN X X NA
5 X X 2017 -0.707 0.707 0.707 X X NA
Some data used:
#Data
DF <- structure(list(Var1 = c("X", "X", "X", "X", "X"), Var2 = c("X",
"X", "X", "X", "X"), Year = c(2016L, 2017L, 2018L, 2016L, 2017L
), BioTest = c(165L, 172L, 169L, 166L, 165L), MathExam = c(140L,
128L, 115L, 139L, 140L), WritingScore = c(10L, 11L, 8L, 10L,
12L), Var3 = c("X", "X", "X", "X", "X"), Var = c("X", "X", "X",
"X", "X"), X4 = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1
Reputation: 887148
The issue could be that dplyr::mutate
was masked by the plyr::mutate
. It can be reproduced with (along with the fact that across
is closed without a function)
iris %>%
group_by(Species) %>%
plyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 8 5 3.4 1.5 0.2 setosa
# 9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
which is the same as the initial 'iris' dataset
Now, check with the correct dplyr::mutate
iris %>%
group_by(Species) %>%
dplyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
# <dbl> <dbl> <dbl> <dbl> <fct>
# 1 0.267 0.190 -0.357 -0.436 setosa
# 2 -0.301 -1.13 -0.357 -0.436 setosa
# 3 -0.868 -0.601 -0.933 -0.436 setosa
# 4 -1.15 -0.865 0.219 -0.436 setosa
# 5 -0.0170 0.454 -0.357 -0.436 setosa
# 6 1.12 1.25 1.37 1.46 setosa
# 7 -1.15 -0.0739 -0.357 0.512 setosa
# 8 -0.0170 -0.0739 0.219 -0.436 setosa
# 9 -1.72 -1.39 -0.357 -0.436 setosa
#10 -0.301 -0.865 0.219 -1.39 setosa
# … with 140 more rows
So, in the OP's code, we just need to use dplyr::mutate
or restart a fresh R session with only dplyr
loaded
DF %>%
group_by(Year)%>%
dplyr::mutate(across(BioTest:WritingScore, scale))
scale
returns a matrix
with some attributes. If we only need the numeric
vector
part, we can either use as.vector
or as.numeric
DF %>%
group_by(Year)%>%
dplyr::mutate(across(BioTest:WritingScore, ~ as.numeric(scale(.)))
NOTE: The select
is not needed within across
Upvotes: 3