Reputation: 456
This is my dataset.
num col1
1 SENSOR_01
2 SENSOR_01
3 SENSOR_01
4 SENSOR_05
5 SENSOR_05
6 SENSOR_05
7 NA
8 SENSOR_01
9 SENSOR_01
10 SENSOR_05
11 SENSOR_05
structure(list(num = 1:11, col1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L), .Label = c("SENSOR_01", "SENSOR_05" ), class = "factor"), count = c(3L, 3L, 3L, 3L, 3L, 3L, 0L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -11L))
I would like to count for only previous duplicated rows. In the row 1-3, there are sensor 3 repeatedly 3 times so count = 3. Here is my expected outcome.
num col1 count
1 SENSOR_01 3
2 SENSOR_01 3
3 SENSOR_01 3
4 SENSOR_05 3
5 SENSOR_05 3
6 SENSOR_05 3
7 NA 1
8 SENSOR_01 2
9 SENSOR_01 2
10 SENSOR_05 2
11 SENSOR_05 2
Using dplyr, How can I make this outcome?
Upvotes: 1
Views: 89
Reputation: 4520
Like an option, we can use order of variables (rownames
in traditional data.frame
). The idea is simple:
In tidyverse
:
dat %>%
mutate(tmp = 1:n()) %>%
group_by(col1) %>%
add_count(tmp = cumsum(c(0, diff(tmp)) > 1)) %>%
ungroup() %>%
select(-tmp)
# # A tibble: 11 x 3
# num col1 n
# <int> <fct> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 NA 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
# 10 10 SENSOR_05 2
# 11 11 SENSOR_05 2
Data:
dat <- structure(
list(
num = 1:11,
col1 = structure(
c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L),
.Label = c("SENSOR_01", "SENSOR_05" ),
class = "factor")
),
class = "data.frame",
row.names = c(NA, -11L)
)
Upvotes: 3
Reputation: 887108
We can use base R
with rle
to create the 'count' column
df$count <- with(rle(df$col1), rep(lengths, lengths))
df$count
#[1] 3 3 3 3 3 3 1 2 2 2 2
Or the dplyr
implementation of the above
library(dplyr)
df %>%
mutate(count = with(rle(col1), rep(lengths, lengths)))
Or an option with tidyverse
without including any other packages
library(dplyr)
df %>%
group_by(grp = replace_na(col1, "VALUE"),
grp = cumsum(grp != lag(grp, default = first(grp)))) %>%
mutate(count = n()) %>%
ungroup %>%
select(-grp)
# A tibble: 11 x 3
# num col1 count
# <int> <chr> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 <NA> 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
#10 10 SENSOR_05 2
#11 11 SENSOR_05 2
df <- structure(list(num = 1:11, col1 = c("SENSOR_01", "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05", "SENSOR_05", NA, "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05")),
class = "data.frame", row.names = c(NA,
-11L))
Upvotes: 2
Reputation: 388982
We can use rleid
to create groups and then count number of rows in each group.
library(dplyr)
df %>%
group_by(group = data.table::rleid(col1)) %>%
mutate(n = n()) %>%
ungroup() %>%
dplyr::select(-group)
# A tibble: 11 x 4
# num col1 count n
# <int> <fct> <int> <int>
# 1 1 SENSOR_01 3 3
# 2 2 SENSOR_01 3 3
# 3 3 SENSOR_01 3 3
# 4 4 SENSOR_05 3 3
# 5 5 SENSOR_05 3 3
# 6 6 SENSOR_05 3 3
# 7 7 NA 1 1
# 8 8 SENSOR_01 2 2
# 9 9 SENSOR_01 2 2
#10 10 SENSOR_05 2 2
#11 11 SENSOR_05 2 2
Keeping both the columns for comparison purposes.
Or using data.table
library(data.table)
setDT(df)[, n := .N, by = rleid(col1)]
Upvotes: 4