R reassign values from a column depending on the frequency

Question

I'm tryng to get the column "names" from my dataframe, and change the names with lesser frequency to "others" in order to simplify a later Java program. For example:

someValue   Names
1           Ramon
2           Alex
4           Ramon
1           Luke
2           Han
3           Leia
4           Luke
8           Ramon
20          Luke

Now, the names with less than 3 frequency have to become others:

someValue   Names
1           Ramon
2           Others
4           Ramon
1           Luke
2           Others
3           Others
4           Luke
8           Ramon
20          Luke

And I am a little lost with this, I hope anyone knows a quick way to do this, thanks in advance!

Jasper · Accepted Answer

You can use the table function to calculate the frequencies, and then find the ones whose frequencies are too low.
An example using character strings:

set.seed(123)
df <- data.frame(
    someValue = 1:50,
    Names = sample(LETTERS, 50, TRUE),
    stringsAsFactors = FALSE
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )
df[ !(df$Names %in% n.many), "Names"] <- "Others"
df

Or the same example, but with a factor:

set.seed(123)
df <- data.frame(
    someValue = 1:50,
    Names = sample(LETTERS, 50, TRUE)
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )

levels(df$Names)[ !(levels(df$Names) %in% n.many) ] <- "Others"
df

R reassign values from a column depending on the frequency

Answers (2)

Related Questions