Eracog
Eracog

Reputation: 225

Removing punctuation between two words

I have a data frame (df) and I would like to remove punctuation.

However there an issue with dot between 2 words and at the end of one word like this:

test.
test1.test2

I use this to remove the punctuation:

library(tm)
removePunctuation(df)

and the result I take is this:

 test
 test1test2

but I would like to take this as result:

test
test1 test2

How is it possible to have a space between two words in the removing process?

Upvotes: 2

Views: 432

Answers (3)

akrun
akrun

Reputation: 887018

We can use gsub to replace the . with a white space and remove the trailing/leading spaces (if any) with trimws.

trimws(gsub('[.]', ' ', str1))
#[1] "test"        "test1 test2"

NOTE: In regex, . by itself means any character. So we should either keep it inside square brackets[.]) or escape it (\\.) or with option fixed=TRUE

trimws(gsub('.', ' ', str1, fixed=TRUE))

data

str1 <- c("test.", "test1.test2")

Upvotes: 2

maRtin
maRtin

Reputation: 6516

you can also use strsplit:

a <- "test."
b <- "test1.test2"

do.call(paste, as.list(strsplit(a, "\\.")[[1]]))

[1] "test"

do.call(paste, as.list(strsplit(b, "\\.")[[1]]))

[1] "test1 test2"

Upvotes: 1

sgibb
sgibb

Reputation: 25726

You can use chartr for single character substitution:

chartr(".", " ", c("test1.test2"))
# [1] "test1 test2"

@akrun suggested trimws to remove the space at the end of your test string:

str <- c("test.", "test1.test2")
trimws(chartr(".", " ", str))
# [1] "test"        "test1 test2"

Upvotes: 4

Related Questions