Reputation: 1752
I have a large data frame with groups and subgroups. I would like to determine the index of the subgroup in each group, like shown in the OUTPUT
column of the following data frame:
df <- data.frame(
Group = factor(c("A","A","A","A","A","B","B","B","B")),
Subgroup = factor(c("a","a","b","b","b","a","a","b","b")),
OUTPUT = c(1,1,2,2,2,1,1,2,2)
)
I've tried several possibilities with without any success. I'd like to work with dplyr
, but I'm not sure how to go about this. The following code returns an unexpected result.
require(dplyr)
df <- df %>%
group_by(Group) %>%
mutate(
OUTPUT_2 = dplyr::id(Subgroup)
)
#df
# Group Subgroup OUTPUT_2
# (fctr) (fctr) (int)
#1 A a 8
#2 A a 8
#3 A b 8
#4 A b 8
#5 A b 8
#6 B a 4
#7 B a 4
#8 B b 4
#9 B b 4
I've the feeling I'm close, but not getting there. Can anybody help?
Upvotes: 2
Views: 460
Reputation: 6969
Here is a solution with data.table
without aggregation:
dt[order(Subgroup), Output := cumsum(!duplicated(Subgroup)) , by = .(Group)]
This will be much faster compared to methods based on aggregation.
Upvotes: 3
Reputation: 886948
We can use the factor
route with dplyr
library(dplyr)
df %>%
group_by(Group) %>%
mutate(OUTPUT = as.numeric(factor(Subgroup, levels= unique(Subgroup))))
# Group Subgroup OUTPUT
# <fctr> <fctr> <dbl>
#1 A a 1
#2 A a 1
#3 A b 2
#4 A b 2
#5 A b 2
#6 B a 1
#7 B a 1
#8 B b 2
#9 B b 2
Or another option is match
with the unique
elements of 'Subgroup' after grouping by 'Group'
df %>%
group_by(Group) %>%
mutate(OUTPUT = match(Subgroup, unique(Subgroup)) )
# Group Subgroup OUTPUT
# <fctr> <fctr> <int>
#1 A a 1
#2 A a 1
#3 A b 2
#4 A b 2
#5 A b 2
#6 B a 1
#7 B a 1
#8 B b 2
#9 B b 2
Upvotes: 2
Reputation: 49448
library(data.table)
dt = as.data.table(df) # or setDT to convert in place
unique(dt[, .(Group, Subgroup)])[, idx := 1:.N, by = Group][dt, on = c('Group', 'Subgroup')]
# Group Subgroup idx OUTPUT
#1: A a 1 1
#2: A a 1 1
#3: A b 2 2
#4: A b 2 2
#5: A b 2 2
#6: B a 1 1
#7: B a 1 1
#8: B b 2 2
#9: B b 2 2
Translation to dplyr
should be straightforward.
Another method, following the idea of using factors from aosmith's comment, is:
dt[, idx := as.integer(factor(Subgroup, unique(Subgroup))), by = Group][]
This will create a factor with correct levels per Group which is the indexing you're after.
Upvotes: 1