Buckethead
Buckethead

Reputation: 135

Simple regex excel extraction

I got excel with thousands rows like:

Basic medical supply - AIT - needs documents from trasnport - drugs
Basic medical supply - TTT - needs documents from trasnport - patiens and other stuff
random string from excel - TTP - other text like always - marijuana per rectum

How to set regex to take strings after 3rd "-" character. In this example "drugs", "patiens and other stuff" and "marijuana per rectum".

It cannot be done in jQuery etc. need to be pure JS becasue i am scripting it in Pentago BI. This strings will be very random, only 3 " - " characters are constants. Is possible to do such thing? Maybe take everything to 3rd " - " slice and then save to variable everything left and trim whitespaces and non letter characters?

EDIT:

I have just noticed that in files every second row in this column got products code:

550-1008-000000-405.02.04.03

Which I have to split too on e.q 550-1028 and 405.02.04.03 so i have to remove those zeros on the middle of string and get 2 substrings

Upvotes: 0

Views: 77

Answers (4)

collapsar
collapsar

Reputation: 17238

The following regex does the trick:

^([^-]+-){3}(.*)

The result is in $2, so a standalone code snippet would look like:

var s = [
          'Basic medical supply - AIT - needs documents from trasnport - drugs'
        , 'Basic medical supply - TTT - needs documents from trasnport - patiens and other stuff'
        , 'random string from excel - TTP - other text like always - marijuana per rectum'
    ]
  , res
  , i;

for ( i=0; i < s.length; i++ ) {
    res = s[i].replace(/^([^-]+-){3}(.*)/g, "$2");
    console.log ( "#" + i + ": '" + s[i] + "# -> '" + res + "'\n" );
}

live test on regex 101.

Explanation

The regex is based on a sequence of arbitrary characters except -, trailed by a single -. It matches 3 consecutive occurrences of this base sequence and assigns the remainder of the line to capture group 2.

Caveat

Beware of consecutive - chars - this solution is not compatible with such strings and the proper way to handle that might depend on your data ( eg. may -- occur as an ascii representation of a dash ?).

Upvotes: 1

Fallenhero
Fallenhero

Reputation: 1583

the regex could just be /^.+-.+-.+-\s*([^-]+)\s*$/

or even easier: /-\s*([^-]+)\s*$/

everything after the last "-". Or could that string contain "-" itself?

Upvotes: 2

Vinod Louis
Vinod Louis

Reputation: 4876

This works simple without any time taking regex

var s = [
          'Basic medical supply - AIT - needs documents from trasnport - drugs'
        , 'Basic medical supply - TTT - needs documents from trasnport - patiens and other stuff'
        , 'random string from excel - TTP - other text like always - marijuana per rectum'
    ];

s.forEach(function(el){
  console.log(el.split("-")[3].trim());
});

Upvotes: 0

gus27
gus27

Reputation: 2656

If you really want a regular expression you can use this:

var s = 'Basic medical supply - AIT - needs documents from trasnport - drugs';
var regex = /[^-]+-[^-]+-[^-]+-(.*)/;
var match = regex.exec(s);
console.log(match[1]); //outputs "drugs"

But I'd prefer the solution from @VinodLouis (in the comments) without using regular expressions...

Upvotes: 0

Related Questions