Reputation: 35
I have a set of excel cells that I need to use a more complex RegEx to remove everything including, and after, the third occurrence of " | " - (space, pipe, space) -- some cells may have less or even 0 occurrences, while others may have 10-20+. there are around 40,000 cells total, all in a column
Canon USA BJC 3000 | BJC 3010 | BJC 6000 | C755 | F30 | F50 | F60 | F80 | I550 | I850 | MP700 | MP730 | S400 | S450 | S500 | S520 | S530 | S600 | S630 | S750 Black Ink Cartridge 420 yield - 4479A003
should become
Canon USA BJC 3000 | BJC 3010 | BJC 6000
any advice on where to start, or a proper regex to pull this?
Upvotes: 0
Views: 1286
Reputation: 208475
Not exactly sure how regex in Excel works, but something like the following should work:
((?:[^\|]*\|){2}[^\|]*).*
Replace that with the contents of the first capture group (usually \1
or $1
).
Example: http://gskinner.com/RegExr/?31sbq
Explanation:
( # start capture group 1
(?: # start non-capturing group, repeated exactly twice
[^\|]*\| # match any number of non '|' followed by '|'
){2} # end non-capturing group
[^\|]* # match any number of non '|'
) # end capturing group
.* # match to end of string
By replacing this with the first capture group, everything that the .*
matches will be removed.
Upvotes: 2