Reputation: 380
I'm tryng to get the column "names" from my dataframe, and change the names with lesser frequency to "others" in order to simplify a later Java program. For example:
someValue Names
1 Ramon
2 Alex
4 Ramon
1 Luke
2 Han
3 Leia
4 Luke
8 Ramon
20 Luke
Now, the names with less than 3 frequency have to become others:
someValue Names
1 Ramon
2 Others
4 Ramon
1 Luke
2 Others
3 Others
4 Luke
8 Ramon
20 Luke
And I am a little lost with this, I hope anyone knows a quick way to do this, thanks in advance!
Upvotes: 0
Views: 45
Reputation: 24613
Following one-liner also works:
> ddf$Names = ifelse(ddf$someValue<3, 'Others', ddf$Names)
or:
> ddf$Names = with(ddf, ifelse(someValue<3, 'Others', Names))
> ddf
someValue Names
1 1 Others
2 2 Others
3 4 Ramon
4 1 Others
5 2 Others
6 3 Leia
7 4 Luke
8 8 Ramon
9 20 Luke
Just make sure that the Names column is 'character' and not 'factor'. If factor, it can be changed with as.character(ddf$Names).
Upvotes: 1
Reputation: 555
You can use the table
function to calculate the frequencies, and then find the ones whose frequencies are too low.
An example using character strings:
set.seed(123)
df <- data.frame(
someValue = 1:50,
Names = sample(LETTERS, 50, TRUE),
stringsAsFactors = FALSE
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )
df[ !(df$Names %in% n.many), "Names"] <- "Others"
df
Or the same example, but with a factor:
set.seed(123)
df <- data.frame(
someValue = 1:50,
Names = sample(LETTERS, 50, TRUE)
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )
levels(df$Names)[ !(levels(df$Names) %in% n.many) ] <- "Others"
df
Upvotes: 2