Reputation: 33970
I have a tbl_df where I want to group_by(u, v)
for each distinct integer combination observed with (u, v)
.
EDIT: this was subsequently resolved by adding the (now-deprecated) group_indices()
back in dplyr 0.4.0
a) I then want to assign each distinct group some arbitrary distinct number label=1,2,3...
e.g. the combination (u,v)==(2,3) could get label 1, (1,3) could get 2, and so on.
How to do this with one mutate()
, without a three-step summarize-and-self-join?
dplyr has a neat function n()
, but that gives the number of elements within its group, not the overall number of the group. In data.table
this would simply be called .GRP
.
b) Actually what I really want to assign a string/character label ('A','B',...).
But numbering groups by integers is good-enough, because I can then use integer_to_label(i)
as below. Unless there's a clever way to merge these two? But don't sweat this part.
set.seed(1234)
# Helper fn for mapping integer 1..26 to character label
integer_to_label <- function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) }
df <- tibble::as_tibble(data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T)))
# Want to label/number each distinct group of unique (u,v) combinations
df %>% group_by(u,v) %>% mutate(label = n()) # WRONG: n() is number of element within its group, not overall number of group
u v
1 2 3
2 1 3
3 1 2
4 2 3
5 1 2
6 3 3
7 1 3
8 1 2
9 3 1
10 3 4
KLUDGE 1: could do df %>% group_by(u,v) %>% summarize(label = n()) , then self-join
Upvotes: 22
Views: 19321
Reputation: 7969
For current dplyr versions (1.0.0 and higher)
Since version 1.0, dplyr has a new cur_group_id function for that:
df %>%
group_by(u, v) %>%
mutate(label = cur_group_id()) ...
For previous dplyr versions (before 1.0.0, although the function is deprecated but still available in 1.0.10)
dplyr has a group_indices()
function that you can use like this:
df %>%
mutate(label = group_indices(., u, v)) %>%
group_by(label) ...
Upvotes: 56
Reputation: 23024
As of dplyr version 1.0.4, the function cur_group_id()
has replaced the older function group_indices
.
Call it on the grouped data.frame:
df %>%
group_by(u, v) %>%
mutate(label = cur_group_id())
# A tibble: 10 x 3
# Groups: u, v [6]
u v label
<int> <int> <int>
1 2 2 4
2 2 2 4
3 1 3 2
4 3 2 6
5 1 4 3
6 1 2 1
7 2 2 4
8 2 4 5
9 3 2 6
10 2 4 5
Upvotes: 9
Reputation: 176
I don't have enough reputation for a comment, so I'm posting an answer instead.
The solution using factor() is a good one, but it has the disadvantage that group numbers are assigned after factor() alphabetizes its levels. The same behaviour happens with dplyr's group_indices(). Perhaps you would like the group numbers to be assigned from 1 to n based on the current group order. In which case, you can use:
my_tibble %>% mutate(group_num = as.integer(factor(group_var, levels = unique(.$group_var))) )
Upvotes: 2
Reputation: 21507
Another approach using data.table
would be
require(data.table)
setDT(df)[,label:=.GRP, by = c("u", "v")]
which results in:
u v label
1: 2 1 1
2: 1 3 2
3: 2 1 1
4: 3 4 3
5: 3 1 4
6: 1 1 5
7: 3 2 6
8: 2 3 7
9: 3 2 6
10: 3 4 3
Upvotes: 11
Reputation: 33970
Updating my answer with three different ways:
A) A neat non-dplyr solution using interaction(u,v)
:
> df$label <- factor(interaction(df$u,df$v, drop=T))
[1] 1.3 2.3 2.2 2.4 3.2 2.4 1.2 1.2 2.1 2.1
Levels: 2.1 1.2 2.2 3.2 1.3 2.3 2.4
> match(df$label, levels(df$label)[ rank(unique(df$label)) ] )
[1] 1 2 3 4 5 4 6 6 7 7
B) Making Randy's neat fast-and-dirty generator-function answer more compact:
get_next_integer = function(){
i = 0
function(u,v){ i <<- i+1 }
}
get_integer = get_next_integer()
df %>% group_by(u,v) %>% mutate(label = get_integer())
C) Also here is a one-liner using a generator function abusing a global variable assignment from this:
i <- 0
generate_integer <- function() { return(assign('i', i+1, envir = .GlobalEnv)) }
df %>% group_by(u,v) %>% mutate(label = generate_integer())
rm(i)
Upvotes: 2
Reputation: 3184
Updated answer
get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())
You can also consider the following slightly unreadable version
group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())
using iterators
package
library(iterators)
counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))
Upvotes: 6