user1484009
user1484009

Reputation: 35

Regular Expression to select everything after and including 3rd occurrence

I have a set of excel cells that I need to use a more complex RegEx to remove everything including, and after, the third occurrence of " | " - (space, pipe, space) -- some cells may have less or even 0 occurrences, while others may have 10-20+. there are around 40,000 cells total, all in a column

Canon USA BJC 3000 | BJC 3010 | BJC 6000 | C755 | F30 | F50 | F60 | F80 | I550 | I850 | MP700 | MP730 | S400 | S450 | S500 | S520 | S530 | S600 | S630 | S750 Black Ink Cartridge 420 yield - 4479A003

should become

Canon USA BJC 3000 | BJC 3010 | BJC 6000

any advice on where to start, or a proper regex to pull this?

Upvotes: 0

Views: 1286

Answers (1)

Andrew Clark
Andrew Clark

Reputation: 208475

Not exactly sure how regex in Excel works, but something like the following should work:

((?:[^\|]*\|){2}[^\|]*).*

Replace that with the contents of the first capture group (usually \1 or $1).

Example: http://gskinner.com/RegExr/?31sbq

Explanation:

(                  # start capture group 1
  (?:                # start non-capturing group, repeated exactly twice
     [^\|]*\|          # match any number of non '|' followed by '|' 
  ){2}               # end non-capturing group
  [^\|]*             # match any number of non '|'
)                  # end capturing group
.*                 # match to end of string

By replacing this with the first capture group, everything that the .* matches will be removed.

Upvotes: 2

Related Questions