Reputation: 191
I have a string as below
I want to count the total number of occurrences of v| and adv| . I am using the below line of code for it
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
(result.split("v|").length - 1) + (result.split("adv|").length - 1)
);
Now technically it should be 2 i.e 1 for adv| and 1 for v| but it is actually counting the v| in the adv| as well and thus the result is 3. Can someone please point me what should I do to count the both as separate words?
Upvotes: 1
Views: 57
Reputation: 30971
To prevent a match starting "from the middle of a word" add \b
at the
start of the regex (unfortunatey, JavaScript flavour of regex does not
include lookbehind) and pass it as a regex (/.../
), not as a string
("..."
).
Note also that the argument of split
is a regex and in regex |
has special meaning (alternative separator).
To match |
literally, prepend it with \
.
So the first regex should be: /\bv\|/
and the second: /\badv\|/
.
Upvotes: 1
Reputation: 370689
For the v
section, you can have a group of two letters, which are not ad
, to ensure that it does not match adv|
:
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
(result.split(/(?:(?!ad)..|^.?)v\|/).length-1) +
(result.split("adv|").length-1)
);
https://regex101.com/r/f80iGd/1
/(?:(?!ad)..|^.?)v\|/
means:
(?:(?!ad)..|^.?)
- A group containing either:
(?!ad)..
- Two letters which are not ad
, or
^.?
- The start of the string, or the start of the string followed by one letter
With all of the above followed by v\|
, v
followed by a literal |
.
Also, rather than using split
to construct a split array and then checking the length of the array minus one, it might be more intuitive to use match
to match occurences of v|
or adv|
, and check the number of matches:
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
result.match(/(?:(?!ad)..|^.?)v\|/g).length +
result.match(/adv\|/g).length
);
Note that in newer Javascript environments, you can also use negative lookbehind to check that the v
isn't preceded by ad
:
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
result.match(/(?<!ad)v\|/g).length +
result.match(/adv\|/g).length
);
(above snippet may not work in all browsers)
You could also combine the two .match
conditions into one, by using an optional group of ad
:
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
result.match(/(?:ad)?v\|/g).length
);
Upvotes: 4
Reputation: 3707
You can simply first take out adv|
and then work on v|
. since adv|
is the superset of v|
.
var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
const advSeparated = result.split("adv|");
const totalCount = advSeparated.reduce((acc, string) =>
acc + (string.split('v|').length - 1)
, advSeparated.length - 1)
console.log(totalCount);
Upvotes: 0