Reputation: 1988
I have a nodejs script that reads in a file and counts word frequencies. I currently feed each line into a function:
function getWords(line) {
return line.match(/\b\w+\b/g);
}
This matches almost everything, except it misses contractions
getWords("I'm") -> {"I", "m"}
However, I cannot just include apostrophes, as I would want matched apostrophes to be word boundaries:
getWords("hey'there'") -> {"hey", "there"}
Is there a way capture contractions while still treating other apostrophes as word boundaries?
Upvotes: 6
Views: 6205
Reputation: 104820
You can match letters and a possible apostrophe followed by letters.
line.match(/[A-Za-z]+('[A-Za-z]+)?/g
Upvotes: 3
Reputation: 19571
The closest I believe you could get with regex would be line.match(/(?!'.*')\b[\w']+\b/g)
but be aware that if there is no space between a word and a '
, it will get treated as a contraction.
As Aaron Dufour mentioned, there would be no way for the regex by itself to know that I'm
is a contraction but hey'there
isn't.
See below:
Upvotes: 5