Reputation: 2129
I want to calculate some statistics (mean, min, max, standard deviation etc) of some columns from a dataframe and store these values as another dataframe.
Here's a sample:
>foo
Col1 Col2 Col3 Col4
1 1 6 10 60
2 2 7 20 70
3 3 8 30 80
4 4 9 40 90
5 5 10 50 100
For example, I want to store the mean and minimum value of Col1 and Col3 in a dataframe like this:
>bar
Col1 Col3
Mean 3 30
Min 1 10
I want to do this through a loop, something like this:
# Result dataframe
bar <- data.frame(Col1 = integer(), Col3 = integer())
variables_for_stats <- c("Col1","Col3")
# I want to do something on the lines of this:
for (z in variables_for_stats){
# Populate column with required values
col <- c(mean(foo$z,min(foo$z)) # Throws an error - argument is not numeric or logical: returning NA
# Add col to 'bar'
bar$z<- col # Does not work
}
My actual foo dataframe currently has ~ 40 columns and actual variables_for_stats are around 20. Both of these can change, hence the desire to do this through a for loop and lists. How should I do this ?
Upvotes: 4
Views: 6248
Reputation: 3518
if you're interested in a tidyverse
solution...
library(tidyverse)
foo <- tribble(~Col1, ~Col2, ~Col3, ~Col4,
1, 6, 10, 60,
2, 7, 20, 70,
3, 8, 30, 80,
4, 9, 40, 90,
5, 10, 50, 100)
foo %>%
gather(Col, Value) %>%
group_by(Col) %>%
summarise(Mean = mean(Value), Minimum = min(Value))
#> # A tibble: 4 x 3
#> Col Mean Minimum
#> <chr> <dbl> <dbl>
#> 1 Col1 3 1
#> 2 Col2 8 6
#> 3 Col3 30 10
#> 4 Col4 80 60
Edit If you want the resulting data frame exactly as you pointed out in your question, then:
foo %>%
gather(Col, Value) %>%
group_by(Col) %>%
summarise(Mean = mean(Value),
Minimum = min(Value)) %>%
gather(Func, Value, 2:3) %>%
spread(Col, Value) %>%
select(Func, Col1, Col3)
# A tibble: 2 x 3
# Func Col1 Col3
# <chr> <dbl> <dbl>
#1 Mean 3 30
#2 Minimum 1 10
Upvotes: 3
Reputation: 79208
Using base R, you can do something like:
aggregate( values~ind,stack(foo),function(x)
c(mean=mean(x),sd=sd(x),min=min(x),max=max(x)))#Write all the functions you want
ind values.mean values.sd values.min values.max
1 Col1 3.000000 1.581139 1.000000 5.000000
2 Col2 8.000000 1.581139 6.000000 10.000000
3 Col3 30.000000 15.811388 10.000000 50.000000
4 Col4 80.000000 15.811388 60.000000 100.000000
If at all you only need the summary statistics then:
library(tidyverse)
summary(foo)%>%
data.frame()%>%
select(-Var1)%>%
separate(Freq,c("Fun","Val"),":")%>%
spread(Fun, Val)
Var2 1st Qu. 3rd Qu. Max. Mean Median Min.
1 Col1 2 4 5 3 3 1
2 Col2 7 9 10 8 8 6
3 Col3 20 40 50 30 30 10
4 Col4 70 90 100 80 80 60
Upvotes: 2
Reputation: 15062
You can do this with tidyverse
tools. The actual calculation is just the summarise
, the rest is just to convert the output into your desired format.
library(tidyverse)
foo <- read_table2(
"Col1 Col2 Col3 Col4
1 6 10 60
2 7 20 70
3 8 30 80
4 9 40 90
5 10 50 10"
)
bar <- foo %>%
summarise_at(
.vars = vars(Col1, Col3),
.funs = funs(mean, min)
) %>%
gather(stat, value) %>%
separate(stat, into = c("Col", "Func")) %>%
spread(Col, value)
bar
#> # A tibble: 2 x 3
#> Func Col1 Col3
#> <chr> <dbl> <dbl>
#> 1 mean 3 30
#> 2 min 1 10
Created on 2018-06-04 by the reprex package (v0.2.0).
Upvotes: 1
Reputation: 887028
We can loop over the columns of interest and get the mean
and min
sapply(foo[c('Col1', 'Col3')], function(x) c(Mean = mean(x), Min =min(x)))
# Col1 Col3
#Mean 3 30
#Min 1 10
NOTE: An apply based solution is a loop as well. But, it gives more control than a for
loop in understanding the output
Upvotes: 3