How to use group_by (or similar command) in lists R

Question

I had to deal with meteorological datasets with more then a hundred stations. Data structure looks like this (month goes from 1 to 12 and year from 1965 to 2020):

station	month	year	hourlymax
CYBG	1	1965	8
CYBC	1	1965	6
CYKG	1	1965	3.5
CYBG	1	1965	2
CYBC	1	1965	3.5
CYKG	1	1665	4

I used the function split Stations <- split(all_stations, all_stations$station, to split this big dataset by station. I am now wondering if it is possible to apply certain function to all of the datasets in the list. For example, I want to get the monthly mean of a variable. I tried the code (list name is station)

for (i in 1:length(Stations)) {
  group_by(month) %>%
  summarise(result = mean(hourlymax) )
}

There might be better ways of spliting the data at first, I don't know any...

Any help/comment is REALLY appreciated! I'm quite new and learning!

Dave2e · Accepted Answer

Since you have all ready split the original dataframe into a list of data frames, this is suited for using lapply or sapply functions depending on whether you want the results as a list or a vector.

result_vector <- sapply(Stations, function(x) { 
     mean(x$hourlymax)
})

Or if you want to use the dplyr strategy, then use group_by on the original dataframe

result_df <- all_stations %>% group_by(month) %>%
  summarise(result = mean(hourlymax))

How to use group_by (or similar command) in lists R

Answers (1)

Related Questions