Reputation: 1946
I have power data (Power) collected every second (Sample). My data.frame is therefore structured as follows:
Test <- data.frame(Sample = c(1:20),
Power = c(0,0,0,0,0,50,67,100,92,0,0,0,36,89,36,0,0,0,89,90))
The number of power entries is dependent upon a human performing an effort on a bike and resting sporadically. Therefore, power does not appear in an ordered fashion. As there are no markers to indicate when an effort starts and stops, I want to include this detail. An effort can be characterised when power > 0 and the start/ stop of each effort can be assessed based on data group together.
I now wish to include a new column (Marker) that looks for power data grouped together and separated by zeroes. For example, my anticipated output would be:
Test$Marker <- c("Rest","Rest","Rest","Rest","Rest","Effort 1","Effort 1","Effort 1","Effort 1",
"Rest","Rest","Rest","Effort 2","Effort 2","Effort 2","Rest","Rest","Rest",
"Effort 3","Effort 3")
Unfortunately my raw data is > 3000 rows long, so to do this manually would be tedious! How do I please go about doing this in R?
Upvotes: 3
Views: 112
Reputation: 6542
An option with dplyr
from the tidyverse
:
library(dplyr)
Test <- data.frame(Sample = c(1:20),
Power = c(0,0,0,0,0,50,67,100,92,0,0,0,36,89,36,0,0,0,89,90))
Test_df <- Test %>%
mutate(
Marker = case_when(
Power > 0 ~ "Effort",
Power == 0 ~"Rest"),
rleid = cumsum(Marker != lag(Marker, 1, default = "NA")),
Marker = case_when(
Marker == "Effort" ~ paste0(Marker, rleid %/% 2),
TRUE ~ "Rest"),
rleid = NULL
)
Test_df
#> Sample Power Marker
#> 1 1 0 Rest
#> 2 2 0 Rest
#> 3 3 0 Rest
#> 4 4 0 Rest
#> 5 5 0 Rest
#> 6 6 50 Effort1
#> 7 7 67 Effort1
#> 8 8 100 Effort1
#> 9 9 92 Effort1
#> 10 10 0 Rest
#> 11 11 0 Rest
#> 12 12 0 Rest
#> 13 13 36 Effort2
#> 14 14 89 Effort2
#> 15 15 36 Effort2
#> 16 16 0 Rest
#> 17 17 0 Rest
#> 18 18 0 Rest
#> 19 19 89 Effort3
#> 20 20 90 Effort3
An other option using a one-liner data.table
:
library(data.table)
Test <- data.frame(Sample = c(1:20),
Power = c(0,0,0,0,0,50,67,100,92,0,0,0,36,89,36,0,0,0,89,90))
setDT(Test)
Test[, Marker := ifelse(Power > 0, paste0("Effort", rleidv(Power > 0) %/% 2), "Rest")]
Test
#> Sample Power Marker
#> 1: 1 0 Rest
#> 2: 2 0 Rest
#> 3: 3 0 Rest
#> 4: 4 0 Rest
#> 5: 5 0 Rest
#> 6: 6 50 Effort1
#> 7: 7 67 Effort1
#> 8: 8 100 Effort1
#> 9: 9 92 Effort1
#> 10: 10 0 Rest
#> 11: 11 0 Rest
#> 12: 12 0 Rest
#> 13: 13 36 Effort2
#> 14: 14 89 Effort2
#> 15: 15 36 Effort2
#> 16: 16 0 Rest
#> 17: 17 0 Rest
#> 18: 18 0 Rest
#> 19: 19 89 Effort3
#> 20: 20 90 Effort3
Upvotes: 2
Reputation: 25385
An option with base R:
indx1 = with(rle(Test$Power>0),rep(values,lengths))
indx2 = with(rle(Test$Power>0),rep(cumsum(values),lengths))
Test$Effort[indx1] = paste0("Effort",indx2[indx1])
Test$Effort[!indx1]="Rest"
Output:
Sample Power Effort
1 1 0 Rest
2 2 0 Rest
3 3 0 Rest
4 4 0 Rest
5 5 0 Rest
6 6 50 Effort1
7 7 67 Effort1
8 8 100 Effort1
9 9 92 Effort1
10 10 0 Rest
11 11 0 Rest
12 12 0 Rest
13 13 36 Effort2
14 14 89 Effort2
15 15 36 Effort2
16 16 0 Rest
17 17 0 Rest
18 18 0 Rest
19 19 89 Effort3
20 20 90 Effort3
About 0.0038 seconds for 3,000 rows ;) Hope this helps!
Upvotes: 5
Reputation: 93813
An alternative base R version using cumsum
:
mrk <- Test$Power==0
Test$New[!mrk] <- paste("effort", as.numeric(factor(cumsum(mrk)[!mrk])))
Test$New[mrk] <- "rest"
# Sample Power Marker New
#1 1 0 Rest rest
#2 2 0 Rest rest
#3 3 0 Rest rest
#4 4 0 Rest rest
#5 5 0 Rest rest
#6 6 50 Effort 1 effort 1
#7 7 67 Effort 1 effort 1
#8 8 100 Effort 1 effort 1
#9 9 92 Effort 1 effort 1
#10 10 0 Rest rest
#11 11 0 Rest rest
#12 12 0 Rest rest
#13 13 36 Effort 2 effort 2
#14 14 89 Effort 2 effort 2
#15 15 36 Effort 2 effort 2
#16 16 0 Rest rest
#17 17 0 Rest rest
#18 18 0 Rest rest
#19 19 89 Effort 3 effort 3
#20 20 90 Effort 3 effort 3
Upvotes: 3