user2716568
user2716568

Reputation: 1946

Extract rows for change in variable, according to condition

My data includes time stamps of activities completed by athletes over different periods. Each period goes for a maximum of twenty minutes. The corresponding weather conditions are noted.

I want to call out the first occurrence of weather and when conditions change. My question is similar to this question except I want the first occurrence and when a change occurs.

My data is structured as follows:

  df <- data.frame(Time=c("0:00:00","0:01:00","0:02:40","0:12:09",
                        "0:00:00", "0:02:07","0:07:19","0:15:16",
                        "0:00:00", "0:03:00","0:08:40","0:13:29",
                        "0:00:00", "0:02:10","0:08:47","0:17:55"),
                 Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
                             'Paul', 'Paul', 'Paul','Paul',
                            'Joe', 'Joe', 'Joe', 'Joe',
                            'Joe', 'Joe', 'Joe', 'Joe'),
                 Period = c('P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2',
                            'P1', 'P1', 'P1', 'P1',
                            'P2', 'P2', 'P2', 'P2'),
                 Weather = c('Sunny', 'Sunny', 'Sunny', 'Cloudy',
                            'Rain', 'Cloudy', 'Rain', 'Rain',
                            'Rain', 'Sunny', 'Rain', 'Rain',
                            'Sunny', 'Sunny', 'Cloudy', 'Cloudy'))
  1. How do I call out the first occurrence and change in weather, according to each athlete and period?
  2. How do I have time in minutes.seconds? For example: 2.40

I have attempted the code below but this is not returning my anticipated output.

Test <- df[match(unique(df$Weather), df$Weather),]

My anticipated output would be:

Output <- data.frame(Time = c(0.0, 12.09, 
                              0.0, 2.07, 7.19, 
                              0.0, 3.00, 8.40, 
                              0.0, 8.47), 
                     Athlete = c('Paul', 'Paul', 
                                 'Paul', 'Paul', 'Paul',
                                 'Joe', 'Joe', 'Joe', 
                                 'Joe', 'Joe'), 
                     Period = c('P1', 'P1', 
                                'P2', 'P2', 'P2',
                                'P1', 'P1', 'P1', 
                                'P2', 'P2'), 
                     Weather = c('Sunny', 'Cloudy', 
                                 'Rain', 'Cloudy', 'Rain', 
                                 'Rain', 'Sunny', 'Rain', 
                                 'Sunny', 'Cloudy'))

From this question, I understand the index of a change in factors within a column can be located, how do I arrange this code to have my desired output?

Thank you.

Upvotes: 0

Views: 204

Answers (2)

RoyalTS
RoyalTS

Reputation: 10203

The not super elegant version:

df %>% filter(c(1,diff(as.numeric(Athlete)))!=0 |
              c(1,diff(as.numeric(Weather)))!=0 |
              c(1,diff(as.numeric(Period)))!=0) 

Note that this requires the data to be sorted the right way.

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

I'd suggest using something like rleid from "data.table". With that you can do:

library(data.table)
as.data.table(df)[, ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1]
##        Time Athlete Period Weather ind
##  1: 0:00:00    Paul     P1   Sunny   1
##  2: 0:12:09    Paul     P1  Cloudy   1
##  3: 0:00:00    Paul     P2    Rain   1
##  4: 0:02:07    Paul     P2  Cloudy   1
##  5: 0:07:19    Paul     P2    Rain   1
##  6: 0:00:00     Joe     P1    Rain   1
##  7: 0:03:00     Joe     P1   Sunny   1
##  8: 0:08:40     Joe     P1    Rain   1
##  9: 0:00:00     Joe     P2   Sunny   1
## 10: 0:08:47     Joe     P2  Cloudy   1

Regarding your question about "How do I have time in minutes.seconds? For example: 2.40--that's not really a standard way of expressing time, is it? 0.4 * 60 = 24, so I wouldn't recommend converting it in that manner.

If you wanted those values as numeric values, perhaps you can split them out into separate columns. For that, I'd recommend my "splitstackshape" package.

library(splitstackshape)
cSplit(as.data.table(df)[
  , ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1][
    , ind := NULL][], "Time", ":")
##     Athlete Period Weather Time_1 Time_2 Time_3
##  1:    Paul     P1   Sunny      0      0      0
##  2:    Paul     P1  Cloudy      0     12      9
##  3:    Paul     P2    Rain      0      0      0
##  4:    Paul     P2  Cloudy      0      2      7
##  5:    Paul     P2    Rain      0      7     19
##  6:     Joe     P1    Rain      0      0      0
##  7:     Joe     P1   Sunny      0      3      0
##  8:     Joe     P1    Rain      0      8     40
##  9:     Joe     P2   Sunny      0      0      0
## 10:     Joe     P2  Cloudy      0      8     47

"Time_1" would be hours, "Time_2" would be minutes, and "Time_3" would be seconds.

If you really wanted the "Time" column the way you described, you could do that with something like:

df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))

And then, proceed with the "data.table" approach described earlier.


The "tidyverse" approach (still using rleid from "data.table") might look something like this:

library(tidyverse)
library(data.table)

df %>%
  mutate(ind = rleid(Athlete, Period, Weather)) %>%
  group_by(ind) %>%
  slice(1) %>%
  ungroup() %>%
  select(-ind) %>%
  mutate(Time = as.numeric(sub(":", ".", gsub("^[^:]*:", "", Time))))
# # A tibble: 10 × 4
#     Time Athlete Period Weather
#    <dbl>  <fctr> <fctr>  <fctr>
# 1   0.00    Paul     P1   Sunny
# 2  12.09    Paul     P1  Cloudy
# 3   0.00    Paul     P2    Rain
# 4   2.07    Paul     P2  Cloudy
# 5   7.19    Paul     P2    Rain
# 6   0.00     Joe     P1    Rain
# 7   3.00     Joe     P1   Sunny
# 8   8.40     Joe     P1    Rain
# 9   0.00     Joe     P2   Sunny
# 10  8.47     Joe     P2  Cloudy

Upvotes: 1

Related Questions