89_Simple
89_Simple

Reputation: 3805

split string including punctuations in R

file_name <- 'I am a good boy who went to Africa, Brazil and India'
strsplit(file_name, ' ')

[[1]]
[1] "I"       "am"      "a"       "good"    "boy"     "who"     "went"    "to"      "Africa," "Brazil" 
[11] "and"     "India"

In the above implementation, I want to return all the strings individually. However, the function is returning 'Africa,' as a single entity whereas I want to return the , also separately.

The expected output should be. The , appears as a separate element

[[1]]
[1] "I"       "am"      "a"       "good"    "boy"     "who"     "went"  "to"    "Africa"   ","   "Brazil" 
[11] "and"     "India"

Upvotes: 1

Views: 31

Answers (1)

akrun
akrun

Reputation: 887511

Perhaps this helps

strsplit(file_name, '\\s+|(?<=[a-z])(?=[[:punct:]])', perl = TRUE)

#[[1]]
#[1] "I"      "am"     "a"      "good"   "boy"    "who"    "went"   
#[8] "to"     "Africa" ","      "Brazil" "and"    "India" 

Or use an extraction method

regmatches(file_name, gregexpr("[[:alnum:]]+|,", file_name))

Upvotes: 2

Related Questions