Reputation: 1505
I'm trying to get an array of words from a string like this:
"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."
The array is supposed to look like this:
[
"exclamation",
"question",
"quotes",
"apostrophe",
"wasn't"
"couldn't",
"didn't"
]
Currently I'm using this expression:
sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");
The problem is, it removes apostrophes from words like "wasn't", turning it into "wasnt".
I can't figure out how to keep the apostrophes in words such as that.
Any help would be greatly appreciated!
var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));
Upvotes: 3
Views: 5094
Reputation: 48711
That would be tricky to work around your own solution but you could consider apostrophes this way:
sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;
console.log(
sentence.match(/\w+(?:'\w+)*/g)
);
Note: changed quantifier from ?
to *
to allow multiple '
in a word.
Upvotes: 4
Reputation: 14927
@revo's answer looks good, here's another option that should work too:
const input = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(input.toLowerCase().match(/\b[\w']+\b/g));
Explanation:
\b
matches at the beginning/end of a word,[\w']+
matches anything that's either letters, digits, underscores or quotes (to omit underscores, you can use [a-zA-Z0-9']
instead),/g
tells the regex to capture all occurrences that match that pattern (not just the first one).Upvotes: 2