How to rename values by frequency in R

Question

I am making several graphs based on the clustering data from DAPC. I need the colors to be the same across all the graphs, and I'd like to use specific colors for the largest groups. The important thing for this question, is I get a data set from DAPC like this:

my_df <- data.frame(
  ID = c(1:10),
  Group = c("a", "b", "b", "c", "a", "b", "a", "b", "b", "c")
)

> my_df

ID  Group
1   a           
2   b           
3   b           
4   c           
5   a           
6   b           
7   a           
8   b           
9   b           
10  c

I know how to find the group with the most members like this:

freqs <- table(my_df$Group)
freqs <- freqs[order(freqs, decreasing = TRUE)]

>freqs
b a c 
5 3 2

Is there a way to change the values based on their frequency? Each time I rerun DAPC, it changes the groups, so I'd like to write code that does this automatically instead of having to redo it manually. Here's how I'd like the dataframe to be changed:

> my_df                          > my_new_df
ID  Group                        ID  Group
1   a                             1  '2nd'
2   b                             2  '1st'          
3   b                             3  '1st'          
4   c                             4  '3rd'          
5   a                             5  '2nd'          
6   b                             6  '1st'          
7   a                             7  '2nd'          
8   b                             8  '1st'          
9   b                             9  '1st'          
10  c                             10 '3rd'

jay.sf · Accepted Answer

You may use ave and create a factor out of it with the corresponding labels=. To avoid hard-coding, define the labels in a vector lb beforehand.

lb <- c("1st", "2nd", "3rd", paste0(4:10, "th"))

with(my_df, factor(as.numeric(ave(as.character(Group), as.character(Group), FUN=table)),
       labels=rev(lb[1:length(unique(table(Group)))])))
#  [1] 2nd 1st 1st 3rd 2nd 1st 2nd 1st 1st 3rd
# Levels: 3rd 2nd 1st

To convert more columns like this, use sapply.

sapply(my_df[selected.columns], function(x) {
  factor(as.numeric(ave(as.character(x), as.character(x), FUN=table)),
         labels=rev(lb[1:length(unique(table(x)))]))
})

How to rename values by frequency in R

Answers (2)

Related Questions