Will_8011
Will_8011

Reputation: 51

How to use group_by (or similar command) in lists R

I had to deal with meteorological datasets with more then a hundred stations. Data structure looks like this (month goes from 1 to 12 and year from 1965 to 2020):

station month year hourlymax
CYBG 1 1965 8
CYBC 1 1965 6
CYKG 1 1965 3.5
CYBG 1 1965 2
CYBC 1 1965 3.5
CYKG 1 1665 4

I used the function split Stations <- split(all_stations, all_stations$station, to split this big dataset by station. I am now wondering if it is possible to apply certain function to all of the datasets in the list. For example, I want to get the monthly mean of a variable. I tried the code (list name is station)

for (i in 1:length(Stations)) {
  group_by(month) %>%
  summarise(result = mean(hourlymax) )
}

There might be better ways of spliting the data at first, I don't know any...

Any help/comment is REALLY appreciated! I'm quite new and learning!

Upvotes: 1

Views: 39

Answers (1)

Dave2e
Dave2e

Reputation: 24149

Since you have all ready split the original dataframe into a list of data frames, this is suited for using lapply or sapply functions depending on whether you want the results as a list or a vector.

result_vector <- sapply(Stations, function(x) { 
     mean(x$hourlymax)
})

Or if you want to use the dplyr strategy, then use group_by on the original dataframe

result_df <- all_stations %>% group_by(month) %>%
  summarise(result = mean(hourlymax))

Upvotes: 1

Related Questions