Reputation: 53
I am making several graphs based on the clustering data from DAPC. I need the colors to be the same across all the graphs, and I'd like to use specific colors for the largest groups. The important thing for this question, is I get a data set from DAPC like this:
my_df <- data.frame(
ID = c(1:10),
Group = c("a", "b", "b", "c", "a", "b", "a", "b", "b", "c")
)
> my_df
ID Group
1 a
2 b
3 b
4 c
5 a
6 b
7 a
8 b
9 b
10 c
I know how to find the group with the most members like this:
freqs <- table(my_df$Group)
freqs <- freqs[order(freqs, decreasing = TRUE)]
>freqs
b a c
5 3 2
Is there a way to change the values based on their frequency? Each time I rerun DAPC, it changes the groups, so I'd like to write code that does this automatically instead of having to redo it manually. Here's how I'd like the dataframe to be changed:
> my_df > my_new_df
ID Group ID Group
1 a 1 '2nd'
2 b 2 '1st'
3 b 3 '1st'
4 c 4 '3rd'
5 a 5 '2nd'
6 b 6 '1st'
7 a 7 '2nd'
8 b 8 '1st'
9 b 9 '1st'
10 c 10 '3rd'
Upvotes: 1
Views: 475
Reputation: 73512
You may use ave
and create a factor
out of it with the corresponding labels=
. To avoid hard-coding, define the labels in a vector lb
beforehand.
lb <- c("1st", "2nd", "3rd", paste0(4:10, "th"))
with(my_df, factor(as.numeric(ave(as.character(Group), as.character(Group), FUN=table)),
labels=rev(lb[1:length(unique(table(Group)))])))
# [1] 2nd 1st 1st 3rd 2nd 1st 2nd 1st 1st 3rd
# Levels: 3rd 2nd 1st
To convert more columns like this, use sapply
.
sapply(my_df[selected.columns], function(x) {
factor(as.numeric(ave(as.character(x), as.character(x), FUN=table)),
labels=rev(lb[1:length(unique(table(x)))]))
})
Upvotes: 1
Reputation: 39613
Do you mean something like this:
my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n())) %>%
arrange(desc(N)) %>% select(-N)
ID Group
1 2 B
2 3 B
3 6 B
4 8 B
5 9 B
6 1 A
7 5 A
8 7 A
9 4 C
10 10 C
Update
This can be useful. I hope this helps.
my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n()) %>% arrange(desc(N)) %>%
bind_cols(my_df %>% select(Group) %>% distinct() %>% rename(key=Group)) %>%
rename(NewGroup=Group,Group=key)) %>%
select(-c(Group,N)) %>% rename(Group=NewGroup)
ID Group
1 1 B
2 2 A
3 3 A
4 4 C
5 5 B
6 6 A
7 7 B
8 8 A
9 9 A
10 10 C
Upvotes: 0