nhern121
nhern121

Reputation: 3921

Particular string split in R

I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.

  mytext="Caracas. Montevideo! Chicago."  
  split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."    

My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.

  mytext=gsub("\\."," .",mytext)
  mytext=gsub("\\!"," !",mytext)
  unlist(strsplit(mytext,split=' '))

So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.

Any help or suggestion is appreciated.

Upvotes: 3

Views: 276

Answers (2)

celiomsj
celiomsj

Reputation: 329

eddi's solution doesn't split the whitespaces. Try this:

> regmatches(mytext, gregexpr(text=mytext, pattern="(?=[\\.\\!])|(?:\\s)", perl=T), invert=T)
[[1]]
[1] "Caracas"    "."          "Montevideo" "!"          "Chicago"    "."   

Upvotes: 1

eddi
eddi

Reputation: 49448

Look-ahead is what you're looking for here:

strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas"     "."           " Montevideo" "!"           " Chicago"    "." 

Upvotes: 3

Related Questions