Karthik
Karthik

Reputation: 129

Keep only the hours for which the value have not changed within the hour R

I have a timeseries dataset with 'n' number of columns. In the dataset, I would like to filter and remove the hours for which the value in a column changed within the hour. In other words, I want to keep the hours that has unchanged value.

Some info about the data:

Expected output:

In the above example, I want to exclude hour 8 from my dataset, as the value in ColA is not constant.

I have a feeling that group_by() and filter() from dplyr might do the job, but I am not sure about the function to find the unchanged values within an hour.

Any help regarding this is much appreciated. Thanks.

Upvotes: 2

Views: 50

Answers (1)

Juan C
Juan C

Reputation: 6132

This does it:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA) < 3)

Checking results:

count(data1, Hour_hr)

  Hour_hr     n
    <dbl> <int>
1       7    46
2       9     1

This will keep colA if there's only one numerical value or no numerical values (NA), keeping hour 7 and 9.

Equivalently you could do:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA, na.rm = T) < 2)

Upvotes: 1

Related Questions