Remove all punctuation from a csv after importing it

Question

Lets say I have a data frame (df) that contains the following data:

df = data.frame(name=c("David","Mark","Alice"),
income=c("5,000","10,00","$50.55"),
state=c("KS?","FL","CA;"))

I want to remove all punctuation from this data frame collectively. Of course, I could take each column as an individual vector and run a gsub command on it (see below), but I want to remove all punctuation in the whole data frame.

gsub("[?.;!¡¿·']", "", df$state)

Is there a way to specify this in R without writing a for loop or using an apply function to apply a function to each data frame column?

Simon O&#39;Hanlon · Accepted Answer

Like @joran said, you can use sed like to substitute out the punctuation you want to get rid of like this...

#  Writing your data out to a file
write.table( df , "~/input.txt" , sep = "	" )

#  Reading it back in again, sans punctuation
read.table( pipe( paste0( "sed s'/[[:punct:]]//g' /Users/Simon/input.txt" ) ) , head=TRUE)
#   name income state
#1 David   5000    KS
#2  Mark   1000    FL
#3 Alice   5055    CA

sed processes your file line by line as it is being read into R. Using the [[:punct:]] regexp class will ensure you really do remove all punctuation.

And it can be done entirely within R. Lovely.

Remove all punctuation from a csv after importing it

Answers (2)

Related Questions