Reputation: 77
C1 C2
------
a 11
a 2
a 2
b 2
b 34
c 2
c 4
c 1
d 4
how can i get index of a groupname first occurence
for example: in column A first occurence of 'b' index is 4 like that i need to get all indexes of first occurence of group
Upvotes: 4
Views: 448
Reputation: 887048
Using ave
with(df, which(as.logical(ave(seq_along(C1), C1,
FUN = function(x) x == x[1]))))
#[1] 1 4 6 9
Upvotes: 0
Reputation: 101247
Try tapply
+ head
like below
with(
df,
tapply(seq_along(C1), C1, head, 1)
)
which gives
a b c d
1 4 6 9
Or we can use aggregate
> aggregate(cbind(idx = seq_along(C1)) ~ C1, df, head, 1)
C1 idx
1 a 1
2 b 4
3 c 6
4 d 9
Upvotes: 2
Reputation: 5673
To add on the already present answers, with base R, using tapply
:
dt$I <- 1:nrow(dt)
tapply(dt$I, dt$C1, function(x) x[1])
a b c d
1 4 6 9
If you want two column, the group and the index, with dplyr
you could use cur_group_rows
, the equivalent of .I
in data.table, see https://dplyr.tidyverse.org/reference/context.html?q=grp#data-table
dt %>%
group_by(C1) %>%
summarise(cur_group_rows()[1])
# A tibble: 4 x 2
C1 index
<fct> <int>
1 a 1
2 b 4
3 c 6
4 d 9
denis = function(){
tapply(dt$I, dt$C1, function(x) x[1])
}
mt1022 = function(){
which(!duplicated(dt$C1))
}
microbenchmark(mt1022(),denis())
Unit: microseconds
expr min lq mean median uq max neval cld
mt1022() 19.5 23.7 46.705 29.9 48.9 525.2 100 a
denis() 61.7 66.0 124.323 89.5 133.1 735.3 100 b
@mt1022 method is much faster
library(dplyr)
library(data.table)
mt1022_datatable = function(){
as.data.table(dt)[, .(index = .I[1]), by = .(C1)]
}
jmpivette = function(){
dt %>%
mutate(r_number = row_number()) %>%
group_by(C1) %>%
summarise(r_number[1])
}
denis_dplyr = function(){
dt %>%
group_by(C1) %>%
summarise(index = cur_group_rows()[1])
}
microbenchmark(mt1022_datatable(),jmpivette(),denis_dplyr())
Unit: milliseconds
expr min lq mean median uq max neval cld
mt1022_datatable() 1.4469 1.72520 2.234030 2.01225 2.30720 8.9519 100 a
jmpivette() 6.6528 7.31915 10.029003 7.94435 8.89835 56.7763 100 c
denis_dplyr() 4.4943 4.92120 7.057608 5.38290 6.13925 41.9592 100 b
Here you see the advantage of data.table
data:
dt <- read.table(text = "C1 C2
a 11
a 2
a 2
b 2
b 34
c 2
c 4
c 1
d 4
",header = T)
Upvotes: 1
Reputation: 275
library(dplyr)
df <- data.frame(C1 = c("a","a","a","b","b","c","c","c","d"),
C2 = c(11,2,2,2,34,2,4,1,4))
df %>%
mutate(r_number = row_number()) %>%
group_by(C1) %>%
summarise(index = min(r_number))
#> # A tibble: 4 x 2
#> C1 index
#> <chr> <int>
#> 1 a 1
#> 2 b 4
#> 3 c 6
#> 4 d 9
Upvotes: 3
Reputation: 17289
With data.table
package, you can get it with .I
:
as.data.table(dtt)[, .(index = .I[1]), by = .(C1)]
# C1 index
# 1: a 1
# 2: b 4
# 3: c 6
# 4: d 9
If only indices are need:
which(!duplicated(dtt$C1))
[1] 1 4 6 9
Upvotes: 5