Reputation: 3812
I have a column in my data frame which looks like this:
> df
# A tibble: 20 x 1
duration
<dbl>
1 0
2 40.0
3 247.
4 11.8
5 116.
6 10.2
7 171.
8 7.58
9 87.8
10 23.2
11 390.
12 35.8
13 4.73
14 29.1
15 0
16 36.8
17 73.8
18 12.9
19 124.
20 10.7
I need to group this data, so that all rows starting from a 0 to the last row before the next zero are in a group. I've accomplished this using a for-loop:
counter <- 0
df$group <- NA
df$group[1] <- 1
for (i in 2:NROW(df)) {
df$group[i] <-
ifelse(df$duration[i] == 0, df$group[i - 1] + 1, df$group[i - 1])
}
which gives me the desired output:
> df
# A tibble: 20 x 2
duration group
<dbl> <dbl>
1 0 1
2 40.0 1
3 247. 1
4 11.8 1
5 116. 1
6 10.2 1
7 171. 1
8 7.58 1
9 87.8 1
10 23.2 1
11 390. 1
12 35.8 1
13 4.73 1
14 29.1 1
15 0 2
16 36.8 2
17 73.8 2
18 12.9 2
19 124. 2
20 10.7 2
But as my original dataframe is quite big i'm looking for a faster solution, and I've been trying to get it working with dplyr but to no avail. Other related questions are counting how often the current value has already appeared, not a specific one so I haven't found a solution to this problem yet.
I'd appreaciate your help in finding a vectorized solution for my problem, thanks! Heres the example-data:
df <-
structure(
list(
duration = c(
0,
40.0009999275208,
247.248000144958,
11.8349997997284,
115.614000082016,
10.2449998855591,
171.426000118256,
7.58200001716614,
87.805999994278,
23.1909999847412,
390.417999982834,
35.8229999542236,
4.73100018501282,
29.0869998931885,
0,
36.789999961853,
73.8420000076294,
12.8770000934601,
123.771999835968,
10.7190001010895
)
),
row.names = c(NA,-20L),
class = c("tbl_df", "tbl", "data.frame")
)
Upvotes: 0
Views: 84
Reputation: 26363
We can create the desired column using cumsum
as below
df %>%
mutate(grp = cumsum(duration == 0))
# A tibble: 20 x 2
# duration grp
# <dbl> <int>
# 1 0 1
# 2 40.0 1
# 3 247. 1
# 4 11.8 1
# 5 116. 1
# 6 10.2 1
# 7 171. 1
# 8 7.58 1
# 9 87.8 1
#10 23.2 1
#11 390. 1
#12 35.8 1
#13 4.73 1
#14 29.1 1
#15 0 2
#16 36.8 2
#17 73.8 2
#18 12.9 2
#19 124. 2
#20 10.7 2
Upvotes: 1