887
887

Reputation: 619

How to rename observations based upon frequency in R?

Within my dataframe, I'm trying to rename certain observations in column 'Name' based upon their respective frequency. That is, I want to rename the observations with a Name frequency lower than 100. If any name occurs less than 100 times in the dataset, I want to rename all those observations "Base" in the Name column. Here is an example:

Game   Home Runs     Name 

1          2        Hank Aaron
2          3        Babe Ruth
3          1        Ted Williams
3          4        Hank Aaron
4          2        Ted Williams
...

If Ted Williams's and Babe Ruth's names were to appear few than 100 times in the data frame, their names would be replaced with "Base" for all values of the Name column.

Game   Home Runs     Name 

1          2        Hank Aaron
2          3        Base
3          1        Base
3          4        Hank Aaron
4          2        Base
...

Additionally, I need the observations to be in the same dataframe, as I plan on running regressions using the new Name vector as an independent (individual effects) variable in a regression.

Apologies if I over-explained. Just a little lost

Upvotes: 0

Views: 333

Answers (2)

Jingxin Zhang
Jingxin Zhang

Reputation: 244

library(forcats)

df %>%

   mutate(Name = fct_lump(Name, n = 100, other_level = "Base")) 

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389185

You can use table to count number of times each Name occurs in the dataframe, using Filter keep only those names which occur less than 100 times, match them in the original dataframe using %in% and replace.

df$Name[df$Name %in% names(Filter(I, table(df$Name) < 100))] <- 'Base'

Upvotes: 0

Related Questions