Bilal Hussain
Bilal Hussain

Reputation: 191

Calculating the number of occurrence of specific words in a string

I have a string as below

I want to count the total number of occurrences of v| and adv| . I am using the below line of code for it

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
  (result.split("v|").length - 1) + (result.split("adv|").length - 1)
);

Now technically it should be 2 i.e 1 for adv| and 1 for v| but it is actually counting the v| in the adv| as well and thus the result is 3. Can someone please point me what should I do to count the both as separate words?

Upvotes: 1

Views: 57

Answers (3)

Valdi_Bo
Valdi_Bo

Reputation: 30971

To prevent a match starting "from the middle of a word" add \b at the start of the regex (unfortunatey, JavaScript flavour of regex does not include lookbehind) and pass it as a regex (/.../), not as a string ("...").

Note also that the argument of split is a regex and in regex | has special meaning (alternative separator). To match | literally, prepend it with \.

So the first regex should be: /\bv\|/ and the second: /\badv\|/.

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 370689

For the v section, you can have a group of two letters, which are not ad, to ensure that it does not match adv|:

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
  (result.split(/(?:(?!ad)..|^.?)v\|/).length-1) +
  (result.split("adv|").length-1)
);

https://regex101.com/r/f80iGd/1

/(?:(?!ad)..|^.?)v\|/ means:

(?:(?!ad)..|^.?) - A group containing either:

(?!ad).. - Two letters which are not ad, or

^.? - The start of the string, or the start of the string followed by one letter

With all of the above followed by v\|, v followed by a literal |.

Also, rather than using split to construct a split array and then checking the length of the array minus one, it might be more intuitive to use match to match occurences of v| or adv|, and check the number of matches:

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
  result.match(/(?:(?!ad)..|^.?)v\|/g).length +
  result.match(/adv\|/g).length
);

Note that in newer Javascript environments, you can also use negative lookbehind to check that the v isn't preceded by ad:

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
  result.match(/(?<!ad)v\|/g).length +
  result.match(/adv\|/g).length
);

(above snippet may not work in all browsers)

You could also combine the two .match conditions into one, by using an optional group of ad:

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";
console.log(
  result.match(/(?:ad)?v\|/g).length
);

Upvotes: 4

Raghav Garg
Raghav Garg

Reputation: 3707

You can simply first take out adv| and then work on v|. since adv| is the superset of v|.

var result = "coord|and adv|then pro|it mod|may v|hurt det|the n|dog";

const advSeparated = result.split("adv|");

const totalCount = advSeparated.reduce((acc, string) =>
  acc + (string.split('v|').length - 1)
, advSeparated.length - 1)


console.log(totalCount);

Upvotes: 0

Related Questions