Reputation: 225
I have a data frame (df) and I would like to remove punctuation.
However there an issue with dot between 2 words and at the end of one word like this:
test.
test1.test2
I use this to remove the punctuation:
library(tm)
removePunctuation(df)
and the result I take is this:
test
test1test2
but I would like to take this as result:
test
test1 test2
How is it possible to have a space between two words in the removing process?
Upvotes: 2
Views: 432
Reputation: 887018
We can use gsub
to replace the .
with a white space and remove the trailing/leading spaces (if any) with trimws
.
trimws(gsub('[.]', ' ', str1))
#[1] "test" "test1 test2"
NOTE: In regex, .
by itself means any character. So we should either keep it inside square brackets[.]
) or escape it (\\.
) or with option fixed=TRUE
trimws(gsub('.', ' ', str1, fixed=TRUE))
str1 <- c("test.", "test1.test2")
Upvotes: 2
Reputation: 6516
you can also use strsplit
:
a <- "test."
b <- "test1.test2"
do.call(paste, as.list(strsplit(a, "\\.")[[1]]))
[1] "test"
do.call(paste, as.list(strsplit(b, "\\.")[[1]]))
[1] "test1 test2"
Upvotes: 1
Reputation: 25726
You can use chartr
for single character substitution:
chartr(".", " ", c("test1.test2"))
# [1] "test1 test2"
@akrun suggested trimws
to remove the space at the end of your test string:
str <- c("test.", "test1.test2")
trimws(chartr(".", " ", str))
# [1] "test" "test1 test2"
Upvotes: 4