Reputation: 13
I have the R I have the following problem and would like to ask for some suggestions and help.
I have this dataframe:
if (!file.exists("storm")){
dir.create("storm")} fileurl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl,destfile = "storm.csv",method = "auto")
storm1<-read.table("storm.csv",header = TRUE,sep = ",")
storm1[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
I want to tidy up the variables because the dataset has all kinds of names and I will just give an example so you guys can check my problem.
WINTER STORM is the string that I want for all this namesstorm1<-storm1[,EVTYPE:= sapply(EVTYPE,gsub,pattern="^WINTER STORM$|^WINTER STORM/HIGH WINDS$|^WINTER STORM HIGH WINDS$|^WINTER STORM/HIGH WIND$|^HEAVY SNOW/WINTER STORM$|^BLIZZARD/WINTER STORM$|^WINTER STORMS$","WINTER STORM")]
This is what I have done.
I was using the sapply to change all the names in that EVTYPE list but my interest is just changing the name of the levels and nothing else and keep the same dataframe.
The output is a dataframe of one variable only, EVTYPE.And I dont know why.
I am honest all the examples from gsub I have seen, seems that it would be an easy operation, because it just changes strings, so I don't understand why it does not work.
Can someone help or suggest something else?
Upvotes: 0
Views: 544
Reputation: 10671
I don't think you need to use sapply
here since gsub
is "vectorized" and you already have your regular expression formatted with |
's. Something like this should work for condensing all of those labels to "WINTER STORM":
storm1$EVTYPES <- gsub(pattern="^WINTER STORM$|^WINTER STORM/HIGH WINDS$|^WINTER STORM HIGH WINDS$|^WINTER STORM/HIGH WIND$|^HEAVY SNOW/WINTER STORM$|^BLIZZARD/WINTER STORM$|^WINTER STORMS$","WINTER STORM",
replacement = "WINTER STORM",
x = storm1$EVTYPES)
Upvotes: 2