user12680
user12680

Reputation: 13

String replacement with gsub in R

I have the R I have the following problem and would like to ask for some suggestions and help.

I have this dataframe:

Dataframe

if (!file.exists("storm")){
      dir.create("storm")} fileurl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl,destfile = "storm.csv",method = "auto")
storm1<-read.table("storm.csv",header = TRUE,sep = ",")

Extraction of variables needed

storm1[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Attempt

I want to tidy up the variables because the dataset has all kinds of names and I will just give an example so you guys can check my problem.

WINTER STORM is the string that I want for all this names
storm1<-storm1[,EVTYPE:= sapply(EVTYPE,gsub,pattern="^WINTER STORM$|^WINTER STORM/HIGH WINDS$|^WINTER STORM HIGH WINDS$|^WINTER STORM/HIGH WIND$|^HEAVY SNOW/WINTER STORM$|^BLIZZARD/WINTER STORM$|^WINTER STORMS$","WINTER STORM")]

This is what I have done.

I was using the sapply to change all the names in that EVTYPE list but my interest is just changing the name of the levels and nothing else and keep the same dataframe.

Problem

The output is a dataframe of one variable only, EVTYPE.And I dont know why.

Expectations

I am honest all the examples from gsub I have seen, seems that it would be an easy operation, because it just changes strings, so I don't understand why it does not work.

Can someone help or suggest something else?

Upvotes: 0

Views: 544

Answers (1)

Nate
Nate

Reputation: 10671

I don't think you need to use sapply here since gsub is "vectorized" and you already have your regular expression formatted with |'s. Something like this should work for condensing all of those labels to "WINTER STORM":

storm1$EVTYPES <- gsub(pattern="^WINTER STORM$|^WINTER STORM/HIGH WINDS$|^WINTER STORM HIGH WINDS$|^WINTER STORM/HIGH WIND$|^HEAVY SNOW/WINTER STORM$|^BLIZZARD/WINTER STORM$|^WINTER STORMS$","WINTER STORM",
                       replacement = "WINTER STORM",
                       x = storm1$EVTYPES)

Upvotes: 2

Related Questions