Bilal Hussain
Bilal Hussain

Reputation: 191

Count the number of words in the line beginning with a particular word

I want to count the number of words in a particular line which contains a specific ID (e.g. *AUY). So far I have tried using the below regex for finding the line but it does not consider the "*" at the start

^ *(.*\b(?:\\*AUY)\b.*) *$

I have below test string

*AUY:   today is holiday so Peter and Mary do not need to go to work .
%mor:   n|today cop|be&3s n|holiday conj|so n:prop|Peter conj|and n:prop|Mary v|do neg|not v|need inf|to v|go prep|to n|work .
%snd:   <00:00:00><00:07:37>
%AUY:   ok_pfp (0.40) er today is holiday errfr ::: so er Peter and Mary {is} ~ er do not need errfr ::: to go to work . errfr :;:a |

The result should be only first string but it returns first and last string in result matches. See this Rubular

Upvotes: 0

Views: 94

Answers (3)

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10466

Try that:

/^.*?\*AUY:(.*?)$/gmi

Explanation

  1. ^ asserts position at start of a line
  2. .*? matches any character (except for line terminators)
  3. *? Quantifier — Matches between zero and unlimited times (lazy)
  4. \* matches the character *
  5. AUY: matches the characters AUY
  6. .*? matches any character (except for line terminators)
  7. $ asserts position at the end of a line
  8. g modifier: global. don't return after first match
  9. m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
  10. i modifier: insensitive

Rubular

Code Sample:

function countWord(){

const regex = /^.*?\*AUY:(.*?)$/gmi;
const str = `*AUY:  today is holiday so Peter and Mary do not need to go to work .
%mor:   n|today cop|be&3s n|holiday conj|so n:prop|Peter conj|and n:prop|Mary v|do neg|not v|need inf|to v|go prep|to n|work .
%snd:   <00:00:00><00:07:37>
%AUY:   ok_pfp (0.40) er today is holiday errfr ::: so er Peter and Mary {is} ~ er do not need errfr ::: to go to work . errfr :;:a |`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    alert(m[1].match(/\b(\w+)\b/g).length);
}

    }

Upvotes: 2

Ganesan Palanisamy
Ganesan Palanisamy

Reputation: 41

use the following regex,

(^.*\*AUY.*$)

You can check it here

Upvotes: 0

Joseph Myers
Joseph Myers

Reputation: 6552

Let x be your string. Then

(x.match(/(^|\n)\*AUY[^\r\n]*/g) || [])
    .map(
        function(s) { return s.match(/\S+/g).length; }
    );

Will return an array of the number of word-like constructs within the respective lines which begin with the string '*AUY'.

Explanation:

The regular expression looks for the string *AUY at the beginning of the string or directly after any newline (i.e., at the beginning of a line even if that line is not at the beginning of the string), as well as any non-CRLF characters following that first token of *AUY (i.e., the rest of that line).

The idiom || [] after a match is performed will return an empty array if the match value is null, thus preventing an error when an array is expected instead of a null value.

The final step .map operates on each element of the matched array and counts the non-whitespace matches and returns these counts as a new array. Note that we do not need to protect this match with the || [] idiom because a null match is impossible, due to the fact that the line contains at minimum the non-whitespace string *AUY.

You can work with this code as a starting point to do what you actually want to do. Good luck!

Upvotes: 3

Related Questions