Reputation: 461

Find common prefix of strings

I am having 4 strings:

"h:/a/b/c"
"h:/a/b/d"
"h:/a/b/e"
"h:/a/c"

I want to find the common prefix for those strings, i.e. "h:/a". How to find that?

Usually I'd split the string with delimiter '/' and put it in another list, and so on.
Is there any better way to do it?

Upvotes: 36

Answers (16)

Yegor

Reputation: 2606

A short LINQy solution of mine (using MinBy from .NET 6).

var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };
    
var commonPrefix = new string(samples.MinBy(s => s.Length)
        .TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());

For .NET 5 and older use this:

var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };
    
var commonPrefix = new string(
    samples.First().Substring(0, samples.Min(s => s.Length))
    .TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());

Upvotes: 29

Alex Veremeenko

Reputation: 356

public string LongestCommonPrefix(string[] strs)
{
    return strs.Aggregate((seed,z) => string.Join("",seed.TakeWhile((v,i)=> z.Length > i && v == z[i])));
}

Upvotes: 1

3dGrabber

Reputation: 5074

I'm late to the party, but I'll give my 2 cents:

public static String CommonPrefix(String str, params String[] more)
{
    var prefixLength = str
                      .TakeWhile((c, i) => more.All(s => i < s.Length && s[i] == c))
                      .Count();

    return str.Substring(0, prefixLength);
}

Explanation:

This works by walking through the chars of str as long as All the other strings have the same char c at index i.
The signature split in String and params String[] ensures that at least one string is provided, no runtime checks needed.
If called with only one string, the function will return the input (a string is its own prefix).
It's cheaper to Count the prefix length and return Substring(0, prefixLength) than to reassemble the enumerated chars by means of String.Join() or Enumerable.Aggregate()

Upvotes: 1

Alex Telon

Reputation: 1202

An improvement on Yegor's answer

var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };

var commonPrefix = new string(
    samples.Min().TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());

First we know that the longest common prefix cannot be longer than the shortest element. So take the shortest one and take chars from it while all other strings have the same char at the same position. In the extreme case we take all chars from the shortest element. By iterating over the shortest element the index lookup will not throw any exceptions.

Another (worse but still interesting) way to solve it using LINQ would be this:

samples.Aggregate(samples.Min(), (current, next) => new string(current.TakeWhile((c,i) => next[i] == c).ToArray() ));

This one works by creating a commonPrefix and comparing that to each element one by one. In each comparison the commonPrefix is either maintained or decreased. In the first iteration current is the min element, but each iteration after that it is the best commonPrefix found so far. Think of this as a depth first solution while the first one is a breadth first.

This type of solution might be improved upon by sorting the samples in terms of length so that the shortest elements are compared first.

This type of solution cannot really be better than the first however. In the best case this is as-good as the first solution. But otherwise it will do extra work by finding temporary commonPrefixes that are longer than necessary.

Upvotes: 0

Shiran Abbasi

Reputation: 99

This is a simple method that finds common string prefix.

public static string GetCommonStartingPath(string[] keys)
        {
            Array.Sort(keys, StringComparer.InvariantCulture);
            string a1 = keys[0], a2 = keys[keys.Length - 1];
            int L = a1.Length, i = 0;
            while (i < L && a1[i] == a2[i])
            {
                i++;
            }

            string result = a1.Substring(0, i);

            return result;
        }

Upvotes: 0

dmitry1100

Reputation: 1329

Here I implemented quite efficient method when you must analyse huge amount of strings, I'm caching counts and lengths here as well which improves performance by about 1,5x on my tests comparing to properties access in loops:

using System.Collections.Generic;
using System.Text;

........

public static string GetCommonPrefix ( IList<string> strings )
{
    var stringsCount = strings.Count;
    if( stringsCount == 0 )
        return null;
    if( stringsCount == 1 )
        return strings[0];

    var sb = new StringBuilder( strings[0] );
    string str;
    int i, j, sbLen, strLen;

    for( i = 1; i < stringsCount; i++ )
    {
        str = strings[i];

        sbLen = sb.Length;
        strLen = str.Length;
        if( sbLen > strLen )
            sb.Length = sbLen = strLen;

        for( j = 0; j < sbLen; j++ )
        {
            if( sb[j] != str[j] )
            {
                sb.Length = j;
                break;
            }
        }
    }

    return sb.ToString();
}

UPD: I also implemented parallel version which uses above method as final step:

using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;

........

public static string GetCommonPrefixParallel ( IList<string> strings )
{
    var stringsCount = strings.Count;
    if( stringsCount == 0 )
        return null;
    if( stringsCount == 1 )
        return strings[0];

    var firstStr = strings[0];
    var finalList = new List<string>();
    var finalListLock = new object();

    Parallel.For( 1, stringsCount,
        () => new StringBuilder( firstStr ),
        ( i, loop, localSb ) =>
        {
            var sbLen = localSb.Length;
            var str = strings[i];
            var strLen = str.Length;
            if( sbLen > strLen )
                localSb.Length = sbLen = strLen;

            for( int j = 0; j < sbLen; j++ )
            {
                if( localSb[j] != str[j] )
                {
                    localSb.Length = j;
                    break;
                }
            }

            return localSb;
        },
        ( localSb ) =>
        {
            lock( finalListLock )
            {
                finalList.Add( localSb.ToString() );
            }
        } );

    return GetCommonPrefix( finalList );
}

GetCommonPrefixParallel() boosts twice comparing to GetCommonPrefix() on huge strings amount and when strings length is significant. On small arrays with short strings GetCommonPrefix() works a little better. I tested on MacBook Pro Retina 13''.

Upvotes: 0

Dr.Gee

Reputation: 69

Top answer can be improved to ignore case:

.TakeWhile(s =>
  {
      var reference = s.First();
      return s.All(d => string.Equals(reference, d, StringComparison.OrdinalIgnoreCase));
  })

Upvotes: -1

Simon D

Reputation: 4290

Working code based on Sam Holder's solution (note it gives h:/a/ not h:/a as the longest common initial substring in the question):

using System;

namespace CommonPrefix
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(CommonPrefix(new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" })); // "h:/a/"
            Console.WriteLine(CommonPrefix(new[] { "abc", "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new[] { "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new string[] { })); // ""
            Console.WriteLine(CommonPrefix(new[] { "a", "abc" })); // "a"
            Console.WriteLine(CommonPrefix(new[] { "abc", "a" })); // "a"

            Console.ReadKey();
        }

        private static string CommonPrefix(string[] ss)
        {
            if (ss.Length == 0)
            {
                return "";
            }

            if (ss.Length == 1)
            {
                return ss[0];
            }

            int prefixLength = 0;

            foreach (char c in ss[0])
            {
                foreach (string s in ss)
                {
                    if (s.Length <= prefixLength || s[prefixLength] != c)
                    {
                        return ss[0].Substring(0, prefixLength);
                    }
                }
                prefixLength++;
            }

            return ss[0]; // all strings identical up to length of ss[0]
        }
    }
}

Upvotes: 7

NER1808

Reputation: 1969

I wrote this ICollection extension to find the longest common base Uri from a collection of web addresses.

As it only check the collection of strings at every slash it will be a bit faster that a generic prefix routine (Not counting my inefficient algorithm!). It's verbose, but easy to follow...my favourite type of code ;-)

Ignores 'http://' and 'https://', as well as case.

    /// <summary>
    /// Resolves a common base Uri from a list of Uri strings. Ignores case. Includes the last slash
    /// </summary>
    /// <param name="collectionOfUriStrings"></param>
    /// <returns>Common root in lowercase</returns>
    public static string GetCommonUri(this ICollection<string> collectionOfUriStrings)
    {
        //Check that collection contains entries
        if (!collectionOfUriStrings.Any())
            return string.Empty;
        //Check that the first is no zero length
        var firstUri = collectionOfUriStrings.FirstOrDefault();
        if(string.IsNullOrEmpty(firstUri))
            return string.Empty;

        //set starting position to be passed '://'
        int previousSlashPos = firstUri.IndexOf("://", StringComparison.OrdinalIgnoreCase) + 2;
        int minPos = previousSlashPos + 1; //results return must have a slash after this initial position
        int nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
        //check if any slashes
        if (nextSlashPos == -1)
            return string.Empty;

        do
        {               
            string common = firstUri.Substring(0, nextSlashPos + 1);
            //check against whole collection
            foreach (var collectionOfUriString in collectionOfUriStrings)
            {
                if (!collectionOfUriString.StartsWith(common, StringComparison.OrdinalIgnoreCase))
                {
                    //return as soon as a mismatch is found
                    return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty ;
                }
            }
            previousSlashPos = nextSlashPos;                
            nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
        } while (nextSlashPos != -1);

        return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty;
    }

Upvotes: 1

Graham Morris

Reputation: 180

I needed to look for the longest common prefix in dissimilar strings. I came up with:

private string FindCommonPrefix(List<string> list)
{
    List<string> prefixes = null;
    for (int len = 1; ; ++len)
    {
        var x = list.Where(s => s.Length >= len)
                    .GroupBy(c => c.Substring(0,len))
                    .Where(grp => grp.Count() > 1)
                    .Select(grp => grp.Key)
                    .ToList();

        if (!x.Any())
        {
            break;
        }
        //  Copy last list
        prefixes = new List<string>(x);
    }

    return prefixes == null ? string.Empty : prefixes.First();
}

If there is more than one prefix with the same length, it arbitrarily returns the first one found. Also it's case-sensitive. Both these points can be addressed by the reader!

Upvotes: 1

Bart Calixto

Reputation: 19705

My approach would be, take the first string. Get letter by letter while all the other string got the same letter on the same index position and stop if there is no match. Remove the last character if it's a separator.

var str_array = new string[]{"h:/a/b/c",
         "h:/a/b/d",
         "h:/a/b/e",
         "h:/a/c"};
var separator = '/';

// get longest common prefix (optinally use ToLowerInvariant)
var ret = str_array.Any() 
    ? str_array.First().TakeWhile((s,i) => 
         str_array.All(e => 
            Char.ToLowerInvariant(s) == Char.ToLowerInvariant(e.Skip(i).Take(1).SingleOrDefault()))) 
    : String.Empty;

// remove last character if it's a separator (optional)
if (ret.LastOrDefault() == separator) 
    ret = ret.Take(ret.Count() -1);

string prefix = new String(ret.ToArray());

Upvotes: 1

user896993

Reputation: 1351

Here is a custom implementation of the trie algorithm in c# (http://en.wikipedia.org/wiki/Trie). It is used to perform an indexed string via prefixes. This class has O(1) write and reads for leaf nodes. For prefix searches, the performance is O(log n), however the count of results for prefix is O(1).

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

public class StringIndex
{
    private Dictionary<char, Item> _rootChars;

    public StringIndex()
    {
        _rootChars = new Dictionary<char, Item>();
    }

    public void Add(string value, string data)
    {
        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
            }
            else
            {
                currentItem = new Item() { Level = level, Letter = c };
                currentChars.Add(c, currentItem);                
            }

            currentChars = currentItem.Items;

            level++;
        }

        if (!currentItem.Values.Contains(data))
        {
            currentItem.Values.Add(data);
            IncrementCount(value);
        }
    }

    private void IncrementCount(string value)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            currentItem = currentChars[c];
            currentItem.Total++;
            currentChars = currentItem.Items;
        }
    }

    public void Remove(string value, string data)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Dictionary<char, Item> parentChars = null;
        Item currentItem = null;

        foreach (char c in value)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                parentChars = currentChars;
                currentChars = currentItem.Items;
            }
            else
            {
                return; // no matches found
            }
        }

        if (currentItem.Values.Contains(data))
        {
            currentItem.Values.Remove(data);
            DecrementCount(value);
            if (currentItem.Total == 0)
            {
                parentChars.Remove(currentItem.Letter);
            }
        }
    }

    private void DecrementCount(string value)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            currentItem = currentChars[c];
            currentItem.Total--;
            currentChars = currentItem.Items;
        }
    }

    public void Clear()
    {
        _rootChars.Clear();
    }

    public int GetValuesByPrefixCount(string prefix)
    {
        int valuescount = 0;

        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in prefix)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                currentChars = currentItem.Items;
            }
            else
            {
                return valuescount; // no matches found
            }
            level++;
        }

        valuescount = currentItem.Total;

        return valuescount;
    }

    public HashSet<string> GetValuesByPrefixFlattened(string prefix)
    {
        var results = GetValuesByPrefix(prefix);
        return new HashSet<string>(results.SelectMany(x => x));
    }

    public List<HashSet<string>> GetValuesByPrefix(string prefix)
    {
        var values = new List<HashSet<string>>();

        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in prefix)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                currentChars = currentItem.Items;
            }
            else
            {
                return values; // no matches found
            }
            level++;
        }

        ExtractValues(values, currentItem);

        return values;
    }

    public void ExtractValues(List<HashSet<string>> values, Item item)
    {
        foreach (Item subitem in item.Items.Values)
        {
            ExtractValues(values, subitem);
        }

        values.Add(item.Values);
    }

    public class Item
    {
        public int Level { get; set; }
        public char Letter { get; set; }
        public int Total { get; set; }
        public HashSet<string> Values { get; set; }
        public Dictionary<char, Item> Items { get; set; }

        public Item()
        {
            Values = new HashSet<string>();
            Items = new Dictionary<char, Item>();
        }
    }
}

Here is the unit testing & example code for how to use this class.

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;

    [TestClass]
    public class StringIndexTest
    {
        [TestMethod]
        public void AddAndSearchValues()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            var output = si.GetValuesByPrefixFlattened("abc");

            Assert.IsTrue(output.Contains("1") && output.Contains("2") && output.Contains("3"));
        }

        [TestMethod]
        public void RemoveAndSearch()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Remove("abcdef", "1");

            var output = si.GetValuesByPrefixFlattened("abc");

            Assert.IsTrue(!output.Contains("1") && output.Contains("2") && output.Contains("3"));
        }

        [TestMethod]
        public void Clear()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Clear();
            var output = si.GetValuesByPrefix("abc");

            Assert.IsTrue(output.Count == 0);
        }

        [TestMethod]
        public void AddAndSearchValuesCount()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Remove("cdefgh", "7");

            var output1 = si.GetValuesByPrefixCount("abc");
            var output2 = si.GetValuesByPrefixCount("b");
            var output3 = si.GetValuesByPrefixCount("bc");
            var output4 = si.GetValuesByPrefixCount("ca");

            Assert.IsTrue(output1 == 3 && output2 == 2 && output3 == 2 && output4 == 0);
        }
    }

Any suggestions on how to improve this class are welcome :)

Upvotes: 2

Frank Schwieterman

Reputation: 24480

I wanted a common string prefix, except I wanted to include any character (like /) and I did not want something performant/fancy just something I can read with tests. So I have this: https://github.com/fschwiet/DreamNJasmine/commit/ad802611ceacc673f2d03c30f7c8199f552b586f

public class CommonStringPrefix
{
    public static string Of(IEnumerable<string> strings)
    {
        var commonPrefix = strings.FirstOrDefault() ?? "";

        foreach(var s in strings)
        {
            var potentialMatchLength = Math.Min(s.Length, commonPrefix.Length);

            if (potentialMatchLength < commonPrefix.Length)
                commonPrefix = commonPrefix.Substring(0, potentialMatchLength);

            for(var i = 0; i < potentialMatchLength; i++)
            {
                if (s[i] != commonPrefix[i])
                {
                    commonPrefix = commonPrefix.Substring(0, i);
                    break;
                }
            }
        }

        return commonPrefix;
    }
}

Upvotes: 2

dtb

Reputation: 217313

string[] xs = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" };

string x = string.Join("/", xs.Select(s => s.Split('/').AsEnumerable())
                              .Transpose()
                              .TakeWhile(s => s.All(d => d == s.First()))
                              .Select(s => s.First()));

with

public static IEnumerable<IEnumerable<T>> Transpose<T>(
    this IEnumerable<IEnumerable<T>> source)
{
    var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
    try
    {
        while (enumerators.All(e => e.MoveNext()))
        {
            yield return enumerators.Select(e => e.Current).ToArray();
        }
    }
    finally
    {
        Array.ForEach(enumerators, e => e.Dispose());
    }
}

Upvotes: 37

Sam Holder

Reputation: 32936

Just loop round the characters of the shortest string and compare each character to the character in the same position in the other strings. Whilst they all match keep going. As soon as one doesn't match then the string up to the current position -1 is the answer.

Something like (pseudo code)

int count=0;
foreach(char c in shortestString)
{
    foreach(string s in otherStrings)
    {
        if (s[count]!=c)
        {
             return shortestString.SubString(0,count-1); //need to check count is not 0 
        }
    }
    count+=1;
 }
 return shortestString;

Upvotes: 15

John Feminella

Reputation: 311536

This is the longest common substring problem (although it's a slightly specialized case since you appear to only care about the prefix). There's no library implementation of the algorithm in the .NET platform that you can call directly, but the article linked here is chock-full of steps on how you'd do it yourself.

Upvotes: 5

Find common prefix of strings

Answers (16)

Related Questions