Eric Yin
Eric Yin

Reputation: 8973

How do I split a string and then rejoin it?

I have the following string in C#.

"aaa,bbbb.ccc|dddd:eee"

I then split it with new char[] {',','.','|',':'}. How would I rejoin this string in the same order as before with the same characters? So the list would end up the exact same as it was before.

EXAMPLE

string s = "aaa,bbbb.ccc|dddd:eee";
string[] s2 = s.Split(new char[] {',','.','|',':'});
// now s2 = {"aaa", "bbbb", "ccc", "dddd", "eee"}
// lets assume I done some operation, and
// now s2 = {"xxx", "yyy", "zzz", "1111", "222"}

s = s2.MagicJoin(~~~~~~);  // I need this

// now s = "xxx,yyy.zzz|1111:222";

EDIT

the char[] in above sample just sample, not in same order or even will not all appear in same time in real world.

EDIT

Just a thought, how about use Regex.split, then first split by char[] get a string[], then use not the char[] to split get another string[], later just put them back. Maybe work but I do not know how to code it.

Upvotes: 4

Views: 1698

Answers (4)

Andras Zoltan
Andras Zoltan

Reputation: 42333

Here you go - this works any combination of delimiters in any order, allowing also for the situation where a delimiter is not actually found in the string either. It's taken me a while to come up with this and, having posted it, it looks more complex than any other answer!

Ah well, I'll keep it here anyway.

public static string SplitAndReJoin(string str, char[] delimiters, 
  Func<string[], string[]> mutator)
{
  //first thing to know is which of the delimiters are 
  //actually in the string, and in what order
  //Using ToArray() here to get the total count of found delimiters
  var delimitersInOrder = (from ci in
                            (from c in delimiters
                             from i in FindIndexesOfAll(str, c)
                             select new { c, i })
                          orderby ci.i
                          select ci.c).ToArray();
  if (delimitersInOrder.Length == 0)
    return str;

  //now split and mutate the string
  string[] strings = str.Split(delimiters);
  strings = mutator(strings);
  //now build a format string
  //note - this operation is much more complicated if you wish to use 
  //StringSplitOptions.RemoveEmptyEntries
  string formatStr = string.Join("",
    delimitersInOrder.Select((c, i) => string.Format("{{{0}}}", i)
      + c));
  //deals with the 'perfect' split - i.e. there's always two values
  //either side of a delimiter
  if (strings.Length > delimitersInOrder.Length)
    formatStr += string.Format("{{{0}}}", strings.Length - 1);

  return string.Format(formatStr, strings);
}

public static IEnumerable<int> FindIndexesOfAll(string str, char c)
{
  int startIndex = 0;
  int lastIndex = -1;

  while(true)
  {
    lastIndex = str.IndexOf(c, startIndex);
    if (lastIndex != -1)
    {
      yield return lastIndex;
      startIndex = lastIndex + 1;
    }
    else
      yield break;
  }
}

And here's a test you can use to validate it:

[TestMethod]
public void TestSplitAndReJoin()
{
  //note - mutator does nothing
  Assert.AreEqual("a,b", SplitAndReJoin("a,b", ",".ToCharArray(), s => s));
  //insert a 'z' in front of every sub string.
  Assert.AreEqual("zaaa,zbbbb.zccc|zdddd:zeee", SplitAndReJoin("aaa,bbbb.ccc|dddd:eee",
    ",.|:".ToCharArray(), s => s.Select(ss => "z" + ss).ToArray()));
  //re-ordering of delimiters + mutate
  Assert.AreEqual("zaaa,zbbbb.zccc|zdddd:zeee", SplitAndReJoin("aaa,bbbb.ccc|dddd:eee",
    ":|.,".ToCharArray(), s => s.Select(ss => "z" + ss).ToArray()));
  //now how about leading or trailing results?
  Assert.AreEqual("a,", SplitAndReJoin("a,", ",".ToCharArray(), s => s));
  Assert.AreEqual(",b", SplitAndReJoin(",b", ",".ToCharArray(), s => s));
}

Note that I've assumed you need to be able to do something with the elements of the array, to manipulate the individual strings before joining them back together again - otherwise presumably you would just keep the original string!

The method builds a dynamic format string. No warranty here on efficiency :)

Upvotes: 3

yamen
yamen

Reputation: 15618

Here's MagicSplit:

public IEnumerable<Tuple<string,char>> MagicSplit(string input, char[] split)
{    
    var buffer = new StringBuilder();
    foreach (var c in input)
    {
        if (split.Contains(c)) 
        {
            var result = buffer.ToString();
            buffer.Clear();
            yield return Tuple.Create(result,c);
        }
        else
        {
            buffer.Append(c);
        }
    }
    yield return Tuple.Create(buffer.ToString(),' ');
}

And two types of MagicJoin:

public string MagicJoin(IEnumerable<Tuple<string,char>> split)
{
    return split.Aggregate(new StringBuilder(), (sb, tup) => sb.Append(tup.Item1).Append(tup.Item2)).ToString();
}

public string MagicJoin(IEnumerable<string> strings, IEnumerable<char> chars)
{
    return strings.Zip(chars, (s,c) => s + c.ToString()).Aggregate(new StringBuilder(), (sb, s) => sb.Append(s)).ToString();
}

Usages:

var s = "aaa,bbbb.ccc|dddd:eee";

// simple
var split = MagicSplit(s, new char[] {',','.','|',':'}).ToArray();
var joined = MagicJoin(split);    

// if you want to change the strings
var strings = split.Select(tup => tup.Item1).ToArray();
var chars = split.Select(tup => tup.Item2).ToArray();
strings[0] = "test";
var joined = MagicJoin(strings,chars);

Upvotes: 3

Mirko
Mirko

Reputation: 4282

How about this?


var x = "aaa,bbbb.ccc|dddd:eee";

var matches = Regex.Matches(x, "(?<Value>[^\\.,|\\:]+)(?<Separator>[\\.,|\\:]?)");

var result = new StringBuilder();

foreach (Match match in matches)
{
    result.AppendFormat("{0}{1}", match.Groups["Value"], match.Groups["Separator"]);
}

Console.WriteLine(result.ToString());
Console.ReadLine();

Or if you love LINQ (which I do):


var x = "aaa,bbbb.ccc|dddd:eee";
var matches = Regex.Matches(x, "(?<Value>[^\\.,|\\:]+)(?<Separator>[\\.,|\\:]?)");
var reassembly = matches.Cast<Match>().Aggregate(new StringBuilder(), (a, v) => a.AppendFormat("{0}{1}", v.Groups["Value"], v.Groups["Separator"])).ToString();
Console.WriteLine(reassembly);
Console.ReadLine();

Needless to say that you could do something with the parts before re-assembling which I would presume is the point of this exercise

Upvotes: 1

porges
porges

Reputation: 30580

It might be easier to do this with the Regex class:

input = Regex.Replace(input, @"[^,.|:]+", DoSomething);

Where DoSomething is a method or lambda that transforms the item in question, e.g.:

string DoSomething(Match m)
{
    return m.Value.ToUpper();
}

For this example the output string for "aaa,bbbb.ccc|dddd:eee" would be "AAA,BBBB.CCC|DDDD:EEE".

If you use a lambda you can very easily keep state around, like this:

int i = 0;
Console.WriteLine(Regex.Replace("aaa,bbbb.ccc|dddd:eee", @"[^,.|:]+",
    _ => (++i).ToString()));

Outputs:

1,2.3|4:5

It just depends on what kind of transformation you're doing to the items.

Upvotes: 3

Related Questions