Reputation: 1946
My data includes time stamps of activities completed by athletes over different periods. Each period goes for a maximum of twenty minutes. The corresponding weather conditions are noted.
I want to call out the first occurrence of weather and when conditions change. My question is similar to this question except I want the first occurrence and when a change occurs.
My data is structured as follows:
df <- data.frame(Time=c("0:00:00","0:01:00","0:02:40","0:12:09",
"0:00:00", "0:02:07","0:07:19","0:15:16",
"0:00:00", "0:03:00","0:08:40","0:13:29",
"0:00:00", "0:02:10","0:08:47","0:17:55"),
Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
'Paul', 'Paul', 'Paul','Paul',
'Joe', 'Joe', 'Joe', 'Joe',
'Joe', 'Joe', 'Joe', 'Joe'),
Period = c('P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2',
'P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2'),
Weather = c('Sunny', 'Sunny', 'Sunny', 'Cloudy',
'Rain', 'Cloudy', 'Rain', 'Rain',
'Rain', 'Sunny', 'Rain', 'Rain',
'Sunny', 'Sunny', 'Cloudy', 'Cloudy'))
I have attempted the code below but this is not returning my anticipated output.
Test <- df[match(unique(df$Weather), df$Weather),]
My anticipated output would be:
Output <- data.frame(Time = c(0.0, 12.09,
0.0, 2.07, 7.19,
0.0, 3.00, 8.40,
0.0, 8.47),
Athlete = c('Paul', 'Paul',
'Paul', 'Paul', 'Paul',
'Joe', 'Joe', 'Joe',
'Joe', 'Joe'),
Period = c('P1', 'P1',
'P2', 'P2', 'P2',
'P1', 'P1', 'P1',
'P2', 'P2'),
Weather = c('Sunny', 'Cloudy',
'Rain', 'Cloudy', 'Rain',
'Rain', 'Sunny', 'Rain',
'Sunny', 'Cloudy'))
From this question, I understand the index of a change in factors within a column can be located, how do I arrange this code to have my desired output?
Thank you.
Upvotes: 0
Views: 204
Reputation: 10203
The not super elegant version:
df %>% filter(c(1,diff(as.numeric(Athlete)))!=0 |
c(1,diff(as.numeric(Weather)))!=0 |
c(1,diff(as.numeric(Period)))!=0)
Note that this requires the data to be sorted the right way.
Upvotes: 1
Reputation: 193527
I'd suggest using something like rleid
from "data.table". With that you can do:
library(data.table)
as.data.table(df)[, ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1]
## Time Athlete Period Weather ind
## 1: 0:00:00 Paul P1 Sunny 1
## 2: 0:12:09 Paul P1 Cloudy 1
## 3: 0:00:00 Paul P2 Rain 1
## 4: 0:02:07 Paul P2 Cloudy 1
## 5: 0:07:19 Paul P2 Rain 1
## 6: 0:00:00 Joe P1 Rain 1
## 7: 0:03:00 Joe P1 Sunny 1
## 8: 0:08:40 Joe P1 Rain 1
## 9: 0:00:00 Joe P2 Sunny 1
## 10: 0:08:47 Joe P2 Cloudy 1
Regarding your question about "How do I have time in minutes.seconds? For example: 2.40--that's not really a standard way of expressing time, is it? 0.4 * 60 = 24
, so I wouldn't recommend converting it in that manner.
If you wanted those values as numeric values, perhaps you can split them out into separate columns. For that, I'd recommend my "splitstackshape" package.
library(splitstackshape)
cSplit(as.data.table(df)[
, ind := sequence(.N), rleid(Athlete, Period, Weather)][ind == 1][
, ind := NULL][], "Time", ":")
## Athlete Period Weather Time_1 Time_2 Time_3
## 1: Paul P1 Sunny 0 0 0
## 2: Paul P1 Cloudy 0 12 9
## 3: Paul P2 Rain 0 0 0
## 4: Paul P2 Cloudy 0 2 7
## 5: Paul P2 Rain 0 7 19
## 6: Joe P1 Rain 0 0 0
## 7: Joe P1 Sunny 0 3 0
## 8: Joe P1 Rain 0 8 40
## 9: Joe P2 Sunny 0 0 0
## 10: Joe P2 Cloudy 0 8 47
"Time_1
" would be hours, "Time_2
" would be minutes, and "Time_3
" would be seconds.
If you really wanted the "Time" column the way you described, you could do that with something like:
df$Time <- as.numeric(sub(":", ".", gsub("^[^:]*:", "", df$Time)))
And then, proceed with the "data.table" approach described earlier.
The "tidyverse" approach (still using rleid
from "data.table") might look something like this:
library(tidyverse)
library(data.table)
df %>%
mutate(ind = rleid(Athlete, Period, Weather)) %>%
group_by(ind) %>%
slice(1) %>%
ungroup() %>%
select(-ind) %>%
mutate(Time = as.numeric(sub(":", ".", gsub("^[^:]*:", "", Time))))
# # A tibble: 10 × 4
# Time Athlete Period Weather
# <dbl> <fctr> <fctr> <fctr>
# 1 0.00 Paul P1 Sunny
# 2 12.09 Paul P1 Cloudy
# 3 0.00 Paul P2 Rain
# 4 2.07 Paul P2 Cloudy
# 5 7.19 Paul P2 Rain
# 6 0.00 Joe P1 Rain
# 7 3.00 Joe P1 Sunny
# 8 8.40 Joe P1 Rain
# 9 0.00 Joe P2 Sunny
# 10 8.47 Joe P2 Cloudy
Upvotes: 1