OKra
OKra

Reputation: 15

dplyr - arrange () according to two criteria per group

I have hourly weather collected for hundreds of farms for a period of five weeks before a sampling event. I want to determine the average Air_Temp for the three weeks prior to the sampling event. Currently, my data are out of order. I want to group by each farm (denoted in File), and then have all of the data in ascending order by Date and Hour. In other words, I want each File to be in order. Here is an example of my data (a dataframe called Weather):

              File Status Hour Air_Temp Dew_Temp Pressure Wind_Dir
1 results_1_farm-19      1   21     24.1     16.5       NA      190
2 results_1_farm-19      1   22     23.0     16.8       NA        0
3 results_1_farm-19      1   23     19.8     16.4       NA        0
4 results_1_farm-19      1    0     17.4     15.8       NA        0
5 results_1_farm-19      1    1     19.0     17.2       NA      170

Wind_Speed Sky Rain_1 Rain_6       Date
1        2.1   7     NA     NA 2013-01-14
2        0.0   4     NA     NA 2013-01-14
3        0.0   0     NA     NA 2013-01-14
4        0.0   0     NA     NA 2013-01-15
5        1.5   0     NA     NA 2013-01-15

It looks like it's in order, but when you scroll through you'll see that the dates are out of order.

So, I'm trying to use dplyr to tell R to arrange the data by Date and Time with this:

Weather1<-Weather%>%
group_by(File)%>%
arrange(Date, Hour)

However, it seems like arrange has ignored the group_by function. In some cases I have data for two farms with the same Hour and Date. Instead of putting each farm in order, it has put the df in order of Date and Hour.

Am I misunderstanding what group_by will do? Thank you for any help.

Upvotes: 1

Views: 1432

Answers (3)

Nico Coallier
Nico Coallier

Reputation: 686

In addition to my comments you can also do the following :

sorted <- Weather %>% 
          arrange(Date, Hour) %>%
          group_by(File)

Upvotes: 0

Matt Jewett
Matt Jewett

Reputation: 3379

group_by shouldn't be necessary for this, it's typically used for when you are looking to perform some kind of aggregate on your data. The arrange will sort first by the File, then by the Date within each file, then by the Hour within each Date. This should get you the structure you're looking for.

Weather1 <- Weather%>%
            arrange(File, Date, Hour)

Upvotes: 1

mt1022
mt1022

Reputation: 17309

I am using ‘0.5.0.9001’ version of dplyr (pre-release of 0.6.0). The new version will be released soon.

for grouped df, the arrange will ignore grouping information by default:

## S3 method for class 'grouped_df'
arrange(.data, ..., .by_group = FALSE)

So you would have to manually set .by_group = TRUE in order to tell arrange that the df is grouped:

Weather1 <- Weather %>%
    group_by(File) %>%
    arrange(Date, Hour, .by_group = TRUE)

Upvotes: 1

Related Questions