dlwhch
dlwhch

Reputation: 167

Split string in parts by minus and plus in R

I want to split this string: test = "-1x^2+3x^3-x^8+1-x" ...into parts by plus and minus characters in R. My goal would be to get: "-1x^2" "+3x^3" "-x^8" "+1" "-x"

This didn't work:

strsplit(test, split = "-")
strsplit(test, split = "+")

Upvotes: 5

Views: 480

Answers (4)

The fourth bird
The fourth bird

Reputation: 163277

In your examples, you use strsplit with a plus and a minus sign which will split on every encounter.

You could assert that what is directly to the left is not either the start of the string or + or -, while asserting + and - directly to the right.

(?<!^|[+-])(?=[+-])

Explanation

  • (?<! Negative lookabehind assertion
    • ^ Start of string
    • | Or - [+-] Match either + or - using a character class
  • ) Close lookbehind
  • (?= Positive lookahead assertion
    • [+-] Match either + or -
  • ) Close lookahead

As the pattern uses lookaround assertions, you have to use perl = T to use a perl style regex.

Example

test <- "-1x^2+3x^3-x^8+1-x"
strsplit(test, split = "(?<!^|[\\s+-])(?=[+-])", perl = T)

Output

[[1]]
[1] "-1x^2" "+3x^3" "-x^8"  "+1"    "-x"  

See a online R demo.


If there can also not be a space to the left, you can write the pattern as

(?<!^|[\\s+-])(?=[+-])

See a regex demo.

Upvotes: 4

G. Grothendieck
G. Grothendieck

Reputation: 269526

This uses gsub to search for any character followed by + or - and inserts a semicolon between the two characters. Then it splits on semicolon.

s <- "-1x^2+3x^3-x^8+1-x"
strsplit(gsub("(.)([+-])", "\\1;\\2", s), ";")[[1]]
## [1] "-1x^2" "+3x^3" "-x^8"  "+1"    "-x"   

Upvotes: 5

ThomasIsCoding
ThomasIsCoding

Reputation: 101257

Try

> strsplit(test, split = "(?<=.)(?=[+-])", perl = TRUE)[[1]]
[1] "-1x^2" "+3x^3" "-x^8"  "+1"    "-x"

where (?<=.)(?=[+-]) captures the spliter that happens to be in front of + or -.

Upvotes: 5

AndrewGB
AndrewGB

Reputation: 16856

We can provide a regular expression in strsplit, where we use ?= to lookahead to find the plus or minus sign, then split on that character. This will allow for the character itself to be retained rather than being dropped in the split.

strsplit(x, "(?<=.)(?=[+])|(?<=.)(?=[-])",perl = TRUE)

# [1] "-1x^2" "+3x^3" "-x^8"  "+1"    "-x"   

Upvotes: 7

Related Questions