Reputation: 3392
I have a string:
He was feeling a bit chilly at home and asked the assistant “What’s the temperature inside?” He figured that the system would figure out the temperature from his Nest thermostat and report that back to him. Instead, the Google Assistant went and fetched the weather report for, a resort town in Turkey.
This is my function to get all uppercase words:
public static function getUpperCase($str) {
preg_match_all('/\b[A-Z][a-zA-Z]*\b/', $str, $matches);
return $matches[0];
}
My output is:
1: "He"
2: "What"
3: "He"
4: "Nest"
5: "Instead"
6: "Google"
7: "Assistant"
8: "Turkey"
How can I get n-gramms:
1: "He"
2: "What"
3: "He"
4: "Nest"
5: "Instead"
6: "Google Assistant"
7: "Turkey"
So I want to group words together if no any words between uppercased words in sentence.
Upvotes: 1
Views: 53
Reputation: 162771
You can adjust your regex to search for whitespace followed by a capitalized word as many times as it greedily can after your initial capitalized word match.
public static function getUpperCase($str) {
preg_match_all('/\b[A-Z][a-zA-Z]*(\s+[A-Z][a-zA-Z]*)*\b/', $str, $matches);
return $matches[0];
}
Upvotes: 2