Reputation: 619
Within my dataframe, I'm trying to rename certain observations in column 'Name' based upon their respective frequency. That is, I want to rename the observations with a Name frequency lower than 100. If any name occurs less than 100 times in the dataset, I want to rename all those observations "Base" in the Name column. Here is an example:
Game Home Runs Name
1 2 Hank Aaron
2 3 Babe Ruth
3 1 Ted Williams
3 4 Hank Aaron
4 2 Ted Williams
...
If Ted Williams's and Babe Ruth's names were to appear few than 100 times in the data frame, their names would be replaced with "Base" for all values of the Name column.
Game Home Runs Name
1 2 Hank Aaron
2 3 Base
3 1 Base
3 4 Hank Aaron
4 2 Base
...
Additionally, I need the observations to be in the same dataframe, as I plan on running regressions using the new Name vector as an independent (individual effects) variable in a regression.
Apologies if I over-explained. Just a little lost
Upvotes: 0
Views: 333
Reputation: 244
library(forcats)
df %>%
mutate(Name = fct_lump(Name, n = 100, other_level = "Base"))
Upvotes: 1
Reputation: 389185
You can use table
to count number of times each Name
occurs in the dataframe, using Filter
keep only those names which occur less than 100 times, match them in the original dataframe using %in%
and replace.
df$Name[df$Name %in% names(Filter(I, table(df$Name) < 100))] <- 'Base'
Upvotes: 0