Dog
Dog

Reputation: 2916

How to split by a regex pattern and keep the delimitter on long string?

I have a long string of addresses, with each address having a structure similar to:

123 Main Street St. Louisville OH 43071,432

I want to split the address string on the state, zipcode, house number (in the above instance this would be: OH 43071,432). While I have a regex combination that identifies these elements in each string (/\d+,\d+/), splitting based on this results in the delimiter being removed.

While I've seen other stack overflow threads that address similar questions to this one, none of those solutions work. For instance, if I place the regex combo in a capture group, like (/(\d+,\d+)/), it returns the zip code and address on another line:

[ '123 Main Street St. Louisville OH ',
  '43071,432',

Similarly, adding ?! or ?= in the regex combo is not effective.

How can I successfully split the address strings, so the output will mirror:

[ '123 Main Street St. Louisville OH 43071,432',
   Main Long Road St. Louisville OH 43071,786

The list of addresses I have is:

let addr =
  "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200";

Upvotes: 1

Views: 89

Answers (3)

vsemozhebuty
vsemozhebuty

Reputation: 13822

If you need this operation only on the backend with last Node.js versions, you can use split() with a lookbehind assertion. This code can also be tested in the last Google Chrome versions.

const addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200";

console.log(addr.split(/(?<=\d+,\d+) /));

Upvotes: 2

guest271314
guest271314

Reputation: 1

How can I successfully split the address strings, so the output will mirror:

[ '123 Main Street St. Louisville OH 43071,432',
   Main Long Road St. Louisville OH 43071,786

To match the string at the updated question you can use RegExp /[^\s][^,]+,\d+/g and String.prototype.match() to match character that is not a space character " " followed by one or more characters that are not comma characters , followed by comma character and one or more digit characters

let addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200";

let res = addr.match(/[^\s][^,]+,\d+/g);

console.log(JSON.stringify(res, null, 2));

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 371203

Because you have overlapping matches, you won't be able to use split - instead, repeatedly use .exec with a capturing group, and extract the capturing group. Match a comma or the beginning of the string, then in a lookahead, capture the address string, followed by a comma and digits:

const addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200";
let match;
const matches = [];
const pattern = /(?:^|,)(?=([^,]+,\d+))./g
while (match = pattern.exec(addr)) {
  matches.push(match[1]);
}
console.log(matches);

Upvotes: 3

Related Questions