Reputation: 381
I'm having a big trouble on dealing with levels names of a data frame.
I have a big data frame in which one of the colums is a factor with a LOT of levels.
The problem is that some of this data are duplicated and the next step in my analysis do not accept duplicated data. So I need to change the name of the duplicated level so I can move on to my next step.
Let me give you a little example:
Say we have this simple data frame with one colum:
> df
col_foo
1 bar1
2 bar2
3 bar3
4 bar2
5 bar4
6 bar5
7 bar3
If we look at the column, we see that it is a factor with 5 distinct levels.
>df$col_foo
[1] bar1 bar2 bar3 bar2 bar4 bar5 bar3
Levels: bar1 bar2 bar3 bar4 bar5
Ok, the problem comes now. See that levels bar2
and bar3
are duplicated. What I want to know is how can I add a level name, something like bar2_X
and substitute only the duplicated one for this. So the dataframe should become this:
> df
col_foo
1 bar1
2 bar2
3 bar3
4 bar2_X
5 bar4
6 bar5
7 bar3_X
Is that possible ? I cannot change the class of the column, it should still be a factor, so solutions that need to change it will not solve my problem unless it is possible to coerce to factor again.
Thanks
Upvotes: 7
Views: 12192
Reputation: 89
You can edit the levels of the factor variable:
levels(df$col_foo) <- c(levels(df$col_foo),"bar2_X","bar3_X")
and then change the repeated levels to one of the new levels you added.
Upvotes: 3
Reputation: 49640
If you want all the entries to be unique then a factor does not gain you much over just using a character variable.
Probably the simplest way to do what you want is to coerce to a character vector, use the duplicated
function to find the duplicates and paste something onto the end of them, then if you want use factor
to recoerce it back to a factor. Possibly something like:
df$col_foo <- factor( ifelse( duplicated(df$col_fo),
paste(df$col_foo, '_x', sep=''), as.character(df$col_foo)))
Upvotes: 4
Reputation: 121057
Call make.names
with unique = TRUE
on your column.
df$col_foo <- factor(make.names(df$col_foo, unique = TRUE))
Upvotes: 10