Reputation: 175
I am required to split few strings in arrays based on conjoining words i.e. on, in, from etc.
string sampleString = "what was total sales for pencils from Japan in 1999";
Desired result:
what was total sales
for pencils
from japan
in 1999
I am familiar with splitting string based on one word but not multiple at the same time:
string[] stringArray = sampleString.Split(new string[] {"of"}, StringSplitOptions.None);
Any suggestions?
Upvotes: 1
Views: 77
Reputation: 391306
For this particular scenario you can use Regular Expressions to do this.
You will have to use something called a lookahead pattern, because otherwise the words you're splitting on would be removed from the results.
Here's a small LINQPad program that demonstrates:
void Main()
{
string sampleString = "what was total sales for pencils from Japan in 1999";
Regex.Split(sampleString, @"\b(?=of|for|in|from)\b").Dump();
}
Output:
what was total sales
for pencils
from Japan
in 1999
But, as I said in the comments, it's going to be tripped up by things like the name of places that contain any of the words you split on, so:
string sampleString = "what was total sales for pencils from the Isle of Islay in 1999";
Regex.Split(sampleString, @"\b(?=of|for|in|from)\b").Dump();
Output:
what was total sales
for pencils
from the Isle
of Islay
in 1999
The regular expression can be rewritten like this to be more expressive for future maintenance:
Regex.Split(sampleString, @"
\b # Must be a word boundary here
# makes sure we don't match words that contain the split words, like 'fortune'
(?= # lookahead group, will match, but not be consumed/zero length
of # List of words, separated by the OR operator, |
|for
|in
|from
)
\b # Also a word boundary", RegexOptions.IgnorePatternWhitespace).Dump();
You might also want to add RegexOptions.IgnoreCase
to the options, to match "Of" and "OF", etc.
Upvotes: 5