SHRUTAYU Kale
SHRUTAYU Kale

Reputation: 419

Removing punctuations from text using R

I need to remove punctuation from the text. I am using tm package but the catch is :

eg: the text is something like this:

data <- '"I am a, new comer","to r,"please help","me:out","here"'

now when I run

library(tm)
data<-removePunctuation(data)

in my code, the result is :

I am a new comerto rplease helpmeouthere 

but what I expect is:

I am a new comer to r please help me out here

Upvotes: 25

Views: 68959

Answers (2)

Dominic
Dominic

Reputation: 634

If you had something like

string <- "hello,you"
> string
[1] "hello,you"

You could do this:

> gsub(",", "", string)
[1] "helloyou"

It replaces the "," with "" in the variable called string

Upvotes: 0

PeterK
PeterK

Reputation: 1243

Here's how I take your question, and an answer that is very close to @David Arenburg's in the comment above.

 data <- '"I am a, new comer","to r,"please help","me:out","here"'
 gsub('[[:punct:] ]+',' ',data)
 [1] " I am a new comer to r please help me out here "

The extra space after [:punct:] is to add spaces to the string and the + matches one or more sequential items in the regular expression. This has the side effect, desirable in some cases, of shortening any sequence of spaces to a single space.

Upvotes: 51

Related Questions