ZDinges
ZDinges

Reputation: 53

How to rename values by frequency in R

I am making several graphs based on the clustering data from DAPC. I need the colors to be the same across all the graphs, and I'd like to use specific colors for the largest groups. The important thing for this question, is I get a data set from DAPC like this:

my_df <- data.frame(
  ID = c(1:10),
  Group = c("a", "b", "b", "c", "a", "b", "a", "b", "b", "c")
)

> my_df

ID  Group
1   a           
2   b           
3   b           
4   c           
5   a           
6   b           
7   a           
8   b           
9   b           
10  c

I know how to find the group with the most members like this:

freqs <- table(my_df$Group)
freqs <- freqs[order(freqs, decreasing = TRUE)]

>freqs
b a c 
5 3 2 

Is there a way to change the values based on their frequency? Each time I rerun DAPC, it changes the groups, so I'd like to write code that does this automatically instead of having to redo it manually. Here's how I'd like the dataframe to be changed:

> my_df                          > my_new_df
ID  Group                        ID  Group
1   a                             1  '2nd'
2   b                             2  '1st'          
3   b                             3  '1st'          
4   c                             4  '3rd'          
5   a                             5  '2nd'          
6   b                             6  '1st'          
7   a                             7  '2nd'          
8   b                             8  '1st'          
9   b                             9  '1st'          
10  c                             10 '3rd'          

Upvotes: 1

Views: 475

Answers (2)

jay.sf
jay.sf

Reputation: 73512

You may use ave and create a factor out of it with the corresponding labels=. To avoid hard-coding, define the labels in a vector lb beforehand.

lb <- c("1st", "2nd", "3rd", paste0(4:10, "th"))

with(my_df, factor(as.numeric(ave(as.character(Group), as.character(Group), FUN=table)),
       labels=rev(lb[1:length(unique(table(Group)))])))
#  [1] 2nd 1st 1st 3rd 2nd 1st 2nd 1st 1st 3rd
# Levels: 3rd 2nd 1st

To convert more columns like this, use sapply.

sapply(my_df[selected.columns], function(x) {
  factor(as.numeric(ave(as.character(x), as.character(x), FUN=table)),
         labels=rev(lb[1:length(unique(table(x)))]))
})

Upvotes: 1

Duck
Duck

Reputation: 39613

Do you mean something like this:

my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n())) %>%
  arrange(desc(N)) %>% select(-N)

   ID Group
1   2     B
2   3     B
3   6     B
4   8     B
5   9     B
6   1     A
7   5     A
8   7     A
9   4     C
10 10     C

Update

This can be useful. I hope this helps.

my_df %>% left_join(my_df %>% group_by(Group) %>% summarise(N=n()) %>% arrange(desc(N)) %>%
                      bind_cols(my_df %>% select(Group) %>% distinct() %>% rename(key=Group)) %>%
                      rename(NewGroup=Group,Group=key)) %>%
  select(-c(Group,N)) %>% rename(Group=NewGroup)

   ID Group
1   1     B
2   2     A
3   3     A
4   4     C
5   5     B
6   6     A
7   7     B
8   8     A
9   9     A
10 10     C

Upvotes: 0

Related Questions