Reputation: 51
I had to deal with meteorological datasets with more then a hundred stations. Data structure looks like this (month goes from 1 to 12 and year from 1965 to 2020):
station | month | year | hourlymax |
---|---|---|---|
CYBG | 1 | 1965 | 8 |
CYBC | 1 | 1965 | 6 |
CYKG | 1 | 1965 | 3.5 |
CYBG | 1 | 1965 | 2 |
CYBC | 1 | 1965 | 3.5 |
CYKG | 1 | 1665 | 4 |
I used the function split Stations <- split(all_stations, all_stations$station
, to split this big dataset by station. I am now wondering if it is possible to apply certain function to all of the datasets in the list. For example, I want to get the monthly mean of a variable. I tried the code (list name is station)
for (i in 1:length(Stations)) {
group_by(month) %>%
summarise(result = mean(hourlymax) )
}
There might be better ways of spliting the data at first, I don't know any...
Any help/comment is REALLY appreciated! I'm quite new and learning!
Upvotes: 1
Views: 39
Reputation: 24149
Since you have all ready split the original dataframe into a list of data frames, this is suited for using lapply
or sapply
functions depending on whether you want the results as a list or a vector.
result_vector <- sapply(Stations, function(x) {
mean(x$hourlymax)
})
Or if you want to use the dplyr strategy, then use group_by
on the original dataframe
result_df <- all_stations %>% group_by(month) %>%
summarise(result = mean(hourlymax))
Upvotes: 1