Reputation: 419
I need to remove punctuation from the text. I am using tm package but the catch is :
eg: the text is something like this:
data <- '"I am a, new comer","to r,"please help","me:out","here"'
now when I run
library(tm)
data<-removePunctuation(data)
in my code, the result is :
I am a new comerto rplease helpmeouthere
but what I expect is:
I am a new comer to r please help me out here
Upvotes: 25
Views: 68959
Reputation: 634
If you had something like
string <- "hello,you"
> string
[1] "hello,you"
You could do this:
> gsub(",", "", string)
[1] "helloyou"
It replaces the "," with "" in the variable called string
Upvotes: 0
Reputation: 1243
Here's how I take your question, and an answer that is very close to @David Arenburg's in the comment above.
data <- '"I am a, new comer","to r,"please help","me:out","here"'
gsub('[[:punct:] ]+',' ',data)
[1] " I am a new comer to r please help me out here "
The extra space after [:punct:] is to add spaces to the string and the + matches one or more sequential items in the regular expression. This has the side effect, desirable in some cases, of shortening any sequence of spaces to a single space.
Upvotes: 51