Reputation: 11
Using iris data as an example, there are three types of iris: setosa, versicolor and virginica. I want to normalize their sepal.length, respectively. I know a simple but tedious process. Whether there is a more simply way to attain my goal? My process:
data(iris)
library(dplyr)
normalize <- function(x){
return((x- mean(x))/(max(x)-min(x)))
}
data1 <- sapply(filter(iris, Species == 'setosa')[1:4], normalize)
data2 <- sapply(filter(iris, Species == 'versicolor')[1:4], normalize)
data3 <- sapply(filter(iris, Species == 'virginica')[1:4], normalize)
Speiec <- rep(c('setosa','versicolor','virginica'), each = 50)
thedata <- rbind(data1, data2,data3)
theirisdata <- data.frame(thedata,Speiec)
The final data "theirisdata" has the same data structure, but the Sepal.length Sepal.width, Petal.length and Petal.width were normalized in each specie group. I need a more concise way to deal such problem. For example the rows of a data frame could be classified into 10 or more groups. For each group, a function was applied to each column.
Upvotes: 1
Views: 1476
Reputation: 1027
You can use group_by
in dplyr to apply functions to each group individually, and then modify multiple columns in place with mutate_each
data(iris)
library(dplyr)
normalize <- function(x){
return((x- mean(x))/(max(x)-min(x)))
}
my_data <- iris %>% group_by(Species) %>%
mutate_each(funs(normalize))
Check that it returns the same as your original answer:
all(my_data == theirisdata)
[1] TRUE
Upvotes: 1