Reputation: 4408
I am trying to extract parts from the following string with consecutive numbers:
word 7, word 8, word 9, word 14
So I get:
word 7, word 8, word 9
word 14
using regular expressions.
What I did was to use (word (?<num>\d+),?\s*)+
and then check the numbers for each capture.
Is it possible to have a regular expression to directly extract only parts with consecutive numbers?
Upvotes: 0
Views: 941
Reputation: 20772
LINQ is pretty handy for sequences of all kinds. It has many useful operators but you can also define your own. Here's how you could use it:
"word 10, word 11, word 7, word 8, word 9, word 14, word 2"
.Split( new [] {", "}, StringSplitOptions.RemoveEmptyEntries)
.ToPartitionsOfConsecutiveValues(w => Int32.Parse(w.Split(' ').Last()))
.Select(sequence => String.Join(", ", sequence))
.ToArray()
.Dump("Array of strings");
Dump
is from LINQPad.
Here's the new operator:
public static class Partition {
public static IEnumerable<List<T>> ToPartitionsOfConsecutiveValues<T>(
this IEnumerable<T> source,
Func<T,int> valueSelector)
{
var lastValue = (int?)null;
List<T> lastList = null;
foreach (var item in source)
{
var value = valueSelector(item);
if (!(lastValue.HasValue))
{
lastList = new List<T>();
}
else if (lastValue.Value != value - 1)
{
yield return lastList;
lastList = new List<T>();
}
lastValue = value;
lastList.Add(item);
}
if (lastValue.HasValue) yield return lastList;
}
}
Update based on comment by @L.B.
LINQ operators are most useful when then have as few concrete types as possible. Pulling out the predicate that uses the item type (int
) allows the operator to be used in other cases.
Here is the same example:
Func<String,Int32> IntSuffix = w => Int32.Parse(w.Split(' ').Last());
Func<String, String, Boolean> breakPredicate
= (prev, next) => IntSuffix(prev) != IntSuffix(next) - 1;
s.Split( new [] {", "}, StringSplitOptions.RemoveEmptyEntries)
.ToPartitionsOfSequences(breakPredicate)
.Select (sequence => String.Join(", ", sequence))
The implementation:
public static IEnumerable<List<T>> ToPartitionsOfSequences<T>(
this IEnumerable<T> source,
Func<T, T, Boolean> breakPredicate)
{
T lastItem = default(T);
List<T> lastList = null;
foreach (var item in source)
{
if (lastList == null)
{
lastList = new List<T>();
}
else if (breakPredicate(lastItem, item))
{
yield return lastList;
lastList = new List<T>();
}
lastItem = item;
lastList.Add(item);
}
if (lastList != null) yield return lastList;
}
Upvotes: 1
Reputation: 1582
Alternatively, you could use:
string words = "word 7, word 8, word 9, word 14";
string[] splittedWords = Regex.Split(words, ", "); //Separating words.
List<string> sortedWords = new List<string>();
int currentWordNumber = 0, lastWordNumber = 0;
foreach (string sptw in splittedWords)
{
if (sortedWords.Count == 0) //No value has been written to the list yet, so:
{
sortedWords.Add(sptw);
lastWordNumber = int.Parse(sptw.Split(' ')[1]); //Storing the number of the word for checking it later.
}
else
{
currentWordNumber = int.Parse(sptw.Split(' ')[1]);
if (currentWordNumber == lastWordNumber + 1)
sortedWords[sortedWords.Count - 1] += ", " + sptw;
else
sortedWords.Add(sptw);
lastWordNumber = currentWordNumber; //Storing the number of the word for checking it later.
}
}
At the end, the list sortedWords will have:
"word 7, word 8, word 9"
"word 14"
Upvotes: 1
Reputation: 6008
It is not possible to do using only regular expressions, as regular expressions can only describe regular languages.
Regular languages, among other limitations, don't allow defining context, which in your case would be the latest met number in your string.
For more info on language and grammar theory, see the Chomsky hierarchy.
Upvotes: 1
Reputation: 6026
Since non-RegEx solutions are acceptable:
var data = "word 7, word 8, word 9, word 14";
// split the data into word and number
var dataCollection = data.Split(',').Select (d => new
{
word = d.Trim().Split(' ')[0],
number = int.Parse(d.Trim().Split(' ')[1])
}).ToList();
// store each set of consective results into a collection
List<string> resultsCollection = new List<string>();
var sb = new StringBuilder();
int i = 0;
while(i < dataCollection.Count ())
{
if(i > 0)
{
if(dataCollection[i].number == dataCollection[i-1].number + 1)
{
if(sb.Length > 0) sb.Append(", ");
}
else
{
resultsCollection.Add(sb.ToString());
sb.Clear();
}
}
sb.AppendFormat("{0} {1}", dataCollection[i].word, dataCollection[i].number);
i++;
}
resultsCollection.Add(sb.ToString());
For your test data, resultsCollection
will contain two items:
word 7, word 8, word 9
word 14
Upvotes: 1