TMOTTM
TMOTTM

Reputation: 3381

How to overwrite a factor in R

I have a dataset:

> k
       EVTYPE FATALITIES INJURIES
198704   HEAT        583        0
862634   WIND        158     1150
68670    WIND        116      785
148852   WIND        114      597
355128   HEAT         99        0
67884    WIND         90     1228
46309    WIND         75      270
371112   HEAT         74      135
230927   HEAT         67        0
78567    WIND         57      504

The variables are as follows. As per the first answer by joran, unused levels can be dropped by droplevels, so no worry about the 898 levels, the illustrative k I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4] where d1 is the original dataset.

> str(k)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 898 levels "   HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
 $ FATALITIES: num  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : num  0 1150 785 597 0 ...

I'm trying to overwrite the WIND factor:

> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")

But both commands give me error messages: level sets of factors are different or invalid factor level, NA generated.

How should I do this?

Upvotes: 0

Views: 1497

Answers (1)

joran
joran

Reputation: 173577

Try this instead:

k <- droplevels(d1[1:10, 3:5])

Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.

This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.

Some people find this irritation and run R using options(stringsAsFactors = FALSE).

To simply change the levels, you can do something like this:

d <- read.table(text = "      EVTYPE FATALITIES INJURIES
 198704   HEAT        583        0
 862634   WIND        158     1150
 68670    WIND        116      785
 148852   WIND        114      597
 355128   HEAT         99        0
 67884    WIND         90     1228
 46309    WIND         75      270
 371112   HEAT         74      135
 230927   HEAT         67        0
 78567    WIND         57      504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504

Or to just change one:

levels(d$EVTYPE)[2] <- 'C'

Upvotes: 1

Related Questions