Reputation: 2984

Replacing column values in data frame, not included in list

I have a data.frame in R, like this:

fruits
   X1  X2     X3
   aa  kiwi  15
   ba  orange 25
   cc  lemon  23
   ba  apple  17
   cc  lemon  19
   cc  orange  18
   cc  orange 21
   ba  banana  17

I'd like to replace all values in column X2 except "orange" and "lemon" with "other". How to do it in R?

Example data:

fruits <- structure(list(X1 = structure(c(1L, 2L, 3L, 2L, 3L, 3L, 3L, 2L
), .Label = c("aa", "ba", "cc"), class = "factor"), X2 = structure(c(3L, 
5L, 4L, 1L, 4L, 5L, 5L, 2L), .Label = c("apple", "banana", "kiwi", 
"lemon", "orange"), class = "factor"), X3 = c(15L, 25L, 23L, 
17L, 19L, 18L, 21L, 17L)), .Names = c("X1", "X2", "X3"), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 2

Answers (3)

Gavin Simpson

Reputation: 174898

An easy way is to coerce the factor to a character vector, then identify which elements are not in the required classes and replace them with "other", and finally coerce back to a factor.

There are two variations on this theme, the first using the replace() function:

transform(fruits,
          X2 = factor(replace(as.character(X2), 
                              list = !X2 %in% c("orange","lemon"),
                              values = "other")))

which gives:

> transform(fruits, X2 = factor(replace(as.character(X2), 
+                                       list = !X2 %in% c("orange","lemon"),
+                                       values = "other")))
  X1     X2 X3
1 aa  other 15
2 ba orange 25
3 cc  lemon 23
4 ba  other 17
5 cc  lemon 19
6 cc orange 18
7 cc orange 21
8 ba  other 17

Or you can do it by hand:

fruits <- transform(fruits, 
                    X2 = {x <- as.character(X2)
                          x[!x %in% c("orange","lemon")] <- "other"
                          factor(x)})
> fruits
  X1     X2 X3
1 aa  other 15
2 ba orange 25
3 cc  lemon 23
4 ba  other 17
5 cc  lemon 19
6 cc orange 18
7 cc orange 21
8 ba  other 17

I use transform() here so that we do the manipulation inside an environment where X2 is visible without having to use things like fruits$X2 which gets tedious to type out.

Upvotes: 2

csgillespie

Reputation: 60492

What about:

R> fruits = data.frame(X1 = 1:3, X2 = c("kiwi", "orange", "lemon"))
R> fruits$X2 = as.character(fruits$X2)
R> fruits[!(fruits$X2 %in% c("lemon", "orange")),]$X2 = "Other"
R> fruits
  X1     X2
1  1  Other
2  2 orange
3  3  lemon

In the above solution, I converted the factors to "characters". You don't have to do this, you can also:

When you create a data frame, use the argument stringsAsFactors = FALSE
If you use read.csv, use the stringsAsFactors

You work with factors directly:

R> fruits$X2 = factor(fruits$X2, levels = c(as.character(fruits$X2), "Other"))
R> fruits[!(fruits$X2 %in% c("lemon", "orange")),]$X2 = "Other"
R> fruits
  X1     X2
1  1  Other
2  2 orange
3  3  lemon

Notice that I extend the levels of the first factor in line 1.

Upvotes: 1

Nick Sabbe

Reputation: 11946

First create a variable indicating the rows to be altered. You can do this e.g. like this:

shouldBecomeOther<-!(fruits$X2 %in% c("orange", "lemon"))

Then use that indexer:

fruits$X2[shouldBecomeOther]<- "other"

Note that if the column is a factor (highly likely), it will take some more work, like this:

tmp<-as.character(fruits$x2)
tmp[shouldBecomeOther]<-"other"
fruits$x2<-factor(tmp)

Upvotes: 5

Replacing column values in data frame, not included in list

Answers (3)

Related Questions