Akash khan
Akash khan

Reputation: 979

Parse custom data by JavaScript regular expression

I have input like below in a js variable var str

MZ 09/10/2020 Zvwi‡L AbywôZ MCQ Test G DËxY© cÖv_©x‡`i ‡ivj b¤^‡ii ZvwjKv t
(cÖKvwkZ djvd‡j †Kv‡bv Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb n‡j KZ©„cÿ Zv ms‡kva‡bi AwaKvi msiÿY K‡i)
9007 1027 1028 1029
7038 8040 1054 1055
0058 3062 1067 1069
3078 7097 1098 1106 = 16 Rb Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb 3457,00867,1122 jtukh $3308

I wanna parse only all 4 digit numeric values from 9007 to 1106 (before = 16 Rb) in an array.

[9007,1027,1028,1029,7038,8040,1054,1055,0058,3062,1067,1069,3078,7097,1098,1106]

I tried with str.match(/\d{4}/g) but it will give me all 4 digits numeric values and 2020, null and unexpected result also.

Upvotes: 2

Views: 285

Answers (3)

Mark Reed
Mark Reed

Reputation: 95315

What's the predictable pattern? If it's as simple as "four digits with space on both sides", then you can use something like /(?<=\s)\d{4}(?=\s)/ (using lookaround assertions stops any number's match from "eating" the space around it and preventing that space from matching for the adjacent number):

const t='MZ 09/10/2020 Zvwi‡L AbywôZ MCQ Test G DËxY© cÖv_©x‡`i ‡ivj b¤^‡ii ZvwjKv t\n(cÖKvwkZ djvd‡j †Kv‡bv Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb n‡j KZ©„cÿ Zv ms‡kva‡bi AwaKvi msiÿY K‡i)\n9007 1027 1028 1029\n7038 8040 1054 1055\n0058 3062 1067 1069\n3078 7097 1098 1106 = 16 Rb Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb 3457,00867,1122 jtukh $3308\n'
console.log(t.match(/(?<=\s)\d{4}(?=\s)/g))
[
  '9007', '1027', '1028',
  '1029', '7038', '8040',
  '1054', '1055', '0058',
  '3062', '1067', '1069',
  '3078', '7097', '1098',
  '1106'
]

If the pattern isn't so simple, then of course the solution won't be either. It depends on what you can assume about the text.

Also, if this is client-side code, be aware that some browsers (e.g. Safari) still don't support lookbehind assertions like (?<=\s), and even those that support it do so only in their more recent versions, since the feature was only added to the language in the ES2018 specification.

You could get away with using only lookahead and allowing the preceding space to be consumed by the regex, but then you'll want to use a capture group so that the space isn't part of the returned match, which means you have to use matchAll instead of match and grab the capture group from each result. So this slightly more complicated expression works in more browsers:

Array.from(t.matchAll(/\s(\d{4})(?=\s)/g)).map(m => m[1])

Either way, what you wind up with is an array of Strings. Your example output has unquoted numbers with leading zeroes, so it's not clear if that's what you want. If you want an array of Numbers, you can get that by either adding a map call to the first answer or modifying the existing map call in the second:

t.match(/(?<=\s)\d{4}(?=\s)/g).map(Number)
//or
Array.from(t.matchAll(/\s(\d{4})(?=\s)/g)).map(m => Number(m[1]))

Which in your example gets you this array of numeric values:

[
  9007, 1027, 1028, 1029,
  7038, 8040, 1054, 1055,
    58, 3062, 1067, 1069,
  3078, 7097, 1098, 1106
]

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163477

You could match parts of 4 digits from the start of the string and then split on whitespace chars.

^\d{4}(?:\s+\d{4})+(?= = \d+ Rb\b)
  • ^ Start of string
  • \d{4} Match 4 digits
  • (?:\s+\d{4})+ Repeat matching 1+ whitespace chars and 4 digitd
  • (?= = \d+ Rb\b) Positive lookahead, assert what is directly at the right is a space, = 1+ digits and Rb

Regex demo

    const regex = /^\d{4}(?:\s+\d{4})+(?= = \d+ Rb\b)/gm;
    const str = `MZ 09/10/2020 Zvwi‡L AbywôZ MCQ Test G DËxY© cÖv_©x‡\`i ‡ivj b¤^‡ii ZvwjKv t
(cÖKvwkZ djvd‡j †Kv‡bv Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb n‡j KZ©„cÿ Zv ms‡kva‡bi AwaKvi msiÿY K‡i)
9007 1027 1028 1029
7038 8040 1054 1055
0058 3062 1067 1069
3078 7097 1098 1106 = 16 Rb Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb 3457,00867,1122 jtukh \$3308`;
    console.log(str.match(regex)[0].split(/\s+/));

Upvotes: 1

Nooruddin Lakhani
Nooruddin Lakhani

Reputation: 6967

Try this

const str = 'MZ 09/10/2020 Zvwi‡L AbywôZ MCQ Test G DËxY© cÖv_©x‡`i ‡ivj b¤^‡ii ZvwjKv t (cÖKvwkZ djvd‡j †Kv‡bv Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb n‡j KZ©„cÿ Zv ms‡kva‡bi AwaKvi msiÿY K‡i) 9007 1027 1028 1029 7038 8040 1054 1055 0058 3062 1067 1069 3078 7097 1098 1106 = 16 Rb Kvi‡Y †Kv‡bv ms‡kva‡bi cÖ‡qvRb 3457,00867,1122 jtukh $3308';

var n = str.indexOf("9007");
var s = str.indexOf("1106");

var res = str.substring(n, s).trim();
var data = res.split(" ");

Upvotes: 0

Related Questions