Reputation: 3381
I have a dataset:
> k
EVTYPE FATALITIES INJURIES
198704 HEAT 583 0
862634 WIND 158 1150
68670 WIND 116 785
148852 WIND 114 597
355128 HEAT 99 0
67884 WIND 90 1228
46309 WIND 75 270
371112 HEAT 74 135
230927 HEAT 67 0
78567 WIND 57 504
The variables are as follows. As per the first answer by joran, unused levels can be dropped by droplevels
, so no worry about the 898 levels, the illustrative k
I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4]
where d1
is the original dataset.
> str(k)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 898 levels " HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
$ FATALITIES: num 583 158 116 114 99 90 75 74 67 57
$ INJURIES : num 0 1150 785 597 0 ...
I'm trying to overwrite the WIND
factor:
> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")
But both commands give me error messages: level sets of factors are different
or invalid factor level, NA generated
.
How should I do this?
Upvotes: 0
Views: 1497
Reputation: 173577
Try this instead:
k <- droplevels(d1[1:10, 3:5])
Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.
This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.
Some people find this irritation and run R using options(stringsAsFactors = FALSE)
.
To simply change the levels, you can do something like this:
d <- read.table(text = " EVTYPE FATALITIES INJURIES
198704 HEAT 583 0
862634 WIND 158 1150
68670 WIND 116 785
148852 WIND 114 597
355128 HEAT 99 0
67884 WIND 90 1228
46309 WIND 75 270
371112 HEAT 74 135
230927 HEAT 67 0
78567 WIND 57 504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
Or to just change one:
levels(d$EVTYPE)[2] <- 'C'
Upvotes: 1