Halian Vilela
Halian Vilela

Reputation: 381

Add a new level to a factor and substitute existing one

I'm having a big trouble on dealing with levels names of a data frame.

I have a big data frame in which one of the colums is a factor with a LOT of levels.

The problem is that some of this data are duplicated and the next step in my analysis do not accept duplicated data. So I need to change the name of the duplicated level so I can move on to my next step.

Let me give you a little example:

Say we have this simple data frame with one colum:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2
5   bar4
6   bar5
7   bar3

If we look at the column, we see that it is a factor with 5 distinct levels.

>df$col_foo
[1] bar1 bar2 bar3 bar2 bar4 bar5 bar3
Levels: bar1 bar2 bar3 bar4 bar5

Ok, the problem comes now. See that levels bar2 and bar3 are duplicated. What I want to know is how can I add a level name, something like bar2_X and substitute only the duplicated one for this. So the dataframe should become this:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2_X
5   bar4
6   bar5
7   bar3_X

Is that possible ? I cannot change the class of the column, it should still be a factor, so solutions that need to change it will not solve my problem unless it is possible to coerce to factor again.

Thanks

Upvotes: 7

Views: 12192

Answers (3)

DART
DART

Reputation: 89

You can edit the levels of the factor variable:

levels(df$col_foo) <- c(levels(df$col_foo),"bar2_X","bar3_X")

and then change the repeated levels to one of the new levels you added.

Upvotes: 3

Greg Snow
Greg Snow

Reputation: 49640

If you want all the entries to be unique then a factor does not gain you much over just using a character variable.

Probably the simplest way to do what you want is to coerce to a character vector, use the duplicated function to find the duplicates and paste something onto the end of them, then if you want use factor to recoerce it back to a factor. Possibly something like:

df$col_foo <- factor( ifelse( duplicated(df$col_fo), 
                    paste(df$col_foo, '_x', sep=''), as.character(df$col_foo)))

Upvotes: 4

Richie Cotton
Richie Cotton

Reputation: 121057

Call make.names with unique = TRUE on your column.

df$col_foo <- factor(make.names(df$col_foo, unique = TRUE))

Upvotes: 10

Related Questions