Reputation: 461
I am having 4 strings:
"h:/a/b/c"
"h:/a/b/d"
"h:/a/b/e"
"h:/a/c"
I want to find the common prefix for those strings, i.e. "h:/a"
.
How to find that?
Usually I'd split the string with delimiter '/'
and put it in another list, and so on.
Is there any better way to do it?
Upvotes: 36
Views: 32748
Reputation: 2606
A short LINQy solution of mine (using MinBy
from .NET 6).
var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };
var commonPrefix = new string(samples.MinBy(s => s.Length)
.TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());
For .NET 5 and older use this:
var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };
var commonPrefix = new string(
samples.First().Substring(0, samples.Min(s => s.Length))
.TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());
Upvotes: 29
Reputation: 356
public string LongestCommonPrefix(string[] strs)
{
return strs.Aggregate((seed,z) => string.Join("",seed.TakeWhile((v,i)=> z.Length > i && v == z[i])));
}
Upvotes: 1
Reputation: 5074
I'm late to the party, but I'll give my 2 cents:
public static String CommonPrefix(String str, params String[] more)
{
var prefixLength = str
.TakeWhile((c, i) => more.All(s => i < s.Length && s[i] == c))
.Count();
return str.Substring(0, prefixLength);
}
Explanation:
This works by walking through the chars of str
as long as All
the other strings have the same char c
at index i
.
The signature split in String
and params String[]
ensures that at least one string is provided, no runtime checks needed.
Count
the prefix length and return Substring(0, prefixLength)
than to reassemble the enumerated chars by means of String.Join()
or Enumerable.Aggregate()
Upvotes: 1
Reputation: 1202
An improvement on Yegor's answer
var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };
var commonPrefix = new string(
samples.Min().TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());
First we know that the longest common prefix cannot be longer than the shortest element. So take the shortest one and take chars from it while all other strings have the same char at the same position. In the extreme case we take all chars from the shortest element. By iterating over the shortest element the index lookup will not throw any exceptions.
Another (worse but still interesting) way to solve it using LINQ would be this:
samples.Aggregate(samples.Min(), (current, next) => new string(current.TakeWhile((c,i) => next[i] == c).ToArray() ));
This one works by creating a commonPrefix and comparing that to each element one by one. In each comparison the commonPrefix is either maintained or decreased. In the first iteration current is the min element, but each iteration after that it is the best commonPrefix found so far. Think of this as a depth first solution while the first one is a breadth first.
This type of solution might be improved upon by sorting the samples in terms of length so that the shortest elements are compared first.
This type of solution cannot really be better than the first however. In the best case this is as-good as the first solution. But otherwise it will do extra work by finding temporary commonPrefixes that are longer than necessary.
Upvotes: 0
Reputation: 99
This is a simple method that finds common string prefix.
public static string GetCommonStartingPath(string[] keys)
{
Array.Sort(keys, StringComparer.InvariantCulture);
string a1 = keys[0], a2 = keys[keys.Length - 1];
int L = a1.Length, i = 0;
while (i < L && a1[i] == a2[i])
{
i++;
}
string result = a1.Substring(0, i);
return result;
}
Upvotes: 0
Reputation: 1329
Here I implemented quite efficient method when you must analyse huge amount of strings, I'm caching counts and lengths here as well which improves performance by about 1,5x on my tests comparing to properties access in loops:
using System.Collections.Generic;
using System.Text;
........
public static string GetCommonPrefix ( IList<string> strings )
{
var stringsCount = strings.Count;
if( stringsCount == 0 )
return null;
if( stringsCount == 1 )
return strings[0];
var sb = new StringBuilder( strings[0] );
string str;
int i, j, sbLen, strLen;
for( i = 1; i < stringsCount; i++ )
{
str = strings[i];
sbLen = sb.Length;
strLen = str.Length;
if( sbLen > strLen )
sb.Length = sbLen = strLen;
for( j = 0; j < sbLen; j++ )
{
if( sb[j] != str[j] )
{
sb.Length = j;
break;
}
}
}
return sb.ToString();
}
UPD: I also implemented parallel version which uses above method as final step:
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
........
public static string GetCommonPrefixParallel ( IList<string> strings )
{
var stringsCount = strings.Count;
if( stringsCount == 0 )
return null;
if( stringsCount == 1 )
return strings[0];
var firstStr = strings[0];
var finalList = new List<string>();
var finalListLock = new object();
Parallel.For( 1, stringsCount,
() => new StringBuilder( firstStr ),
( i, loop, localSb ) =>
{
var sbLen = localSb.Length;
var str = strings[i];
var strLen = str.Length;
if( sbLen > strLen )
localSb.Length = sbLen = strLen;
for( int j = 0; j < sbLen; j++ )
{
if( localSb[j] != str[j] )
{
localSb.Length = j;
break;
}
}
return localSb;
},
( localSb ) =>
{
lock( finalListLock )
{
finalList.Add( localSb.ToString() );
}
} );
return GetCommonPrefix( finalList );
}
GetCommonPrefixParallel() boosts twice comparing to GetCommonPrefix() on huge strings amount and when strings length is significant. On small arrays with short strings GetCommonPrefix() works a little better. I tested on MacBook Pro Retina 13''.
Upvotes: 0
Reputation: 69
Top answer can be improved to ignore case:
.TakeWhile(s =>
{
var reference = s.First();
return s.All(d => string.Equals(reference, d, StringComparison.OrdinalIgnoreCase));
})
Upvotes: -1
Reputation: 4290
Working code based on Sam Holder's solution (note it gives h:/a/ not h:/a as the longest common initial substring in the question):
using System;
namespace CommonPrefix
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CommonPrefix(new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" })); // "h:/a/"
Console.WriteLine(CommonPrefix(new[] { "abc", "abc" })); // "abc"
Console.WriteLine(CommonPrefix(new[] { "abc" })); // "abc"
Console.WriteLine(CommonPrefix(new string[] { })); // ""
Console.WriteLine(CommonPrefix(new[] { "a", "abc" })); // "a"
Console.WriteLine(CommonPrefix(new[] { "abc", "a" })); // "a"
Console.ReadKey();
}
private static string CommonPrefix(string[] ss)
{
if (ss.Length == 0)
{
return "";
}
if (ss.Length == 1)
{
return ss[0];
}
int prefixLength = 0;
foreach (char c in ss[0])
{
foreach (string s in ss)
{
if (s.Length <= prefixLength || s[prefixLength] != c)
{
return ss[0].Substring(0, prefixLength);
}
}
prefixLength++;
}
return ss[0]; // all strings identical up to length of ss[0]
}
}
}
Upvotes: 7
Reputation: 1969
I wrote this ICollection extension to find the longest common base Uri from a collection of web addresses.
As it only check the collection of strings at every slash it will be a bit faster that a generic prefix routine (Not counting my inefficient algorithm!). It's verbose, but easy to follow...my favourite type of code ;-)
Ignores 'http://' and 'https://', as well as case.
/// <summary>
/// Resolves a common base Uri from a list of Uri strings. Ignores case. Includes the last slash
/// </summary>
/// <param name="collectionOfUriStrings"></param>
/// <returns>Common root in lowercase</returns>
public static string GetCommonUri(this ICollection<string> collectionOfUriStrings)
{
//Check that collection contains entries
if (!collectionOfUriStrings.Any())
return string.Empty;
//Check that the first is no zero length
var firstUri = collectionOfUriStrings.FirstOrDefault();
if(string.IsNullOrEmpty(firstUri))
return string.Empty;
//set starting position to be passed '://'
int previousSlashPos = firstUri.IndexOf("://", StringComparison.OrdinalIgnoreCase) + 2;
int minPos = previousSlashPos + 1; //results return must have a slash after this initial position
int nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
//check if any slashes
if (nextSlashPos == -1)
return string.Empty;
do
{
string common = firstUri.Substring(0, nextSlashPos + 1);
//check against whole collection
foreach (var collectionOfUriString in collectionOfUriStrings)
{
if (!collectionOfUriString.StartsWith(common, StringComparison.OrdinalIgnoreCase))
{
//return as soon as a mismatch is found
return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty ;
}
}
previousSlashPos = nextSlashPos;
nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
} while (nextSlashPos != -1);
return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty;
}
Upvotes: 1
Reputation: 180
I needed to look for the longest common prefix in dissimilar strings. I came up with:
private string FindCommonPrefix(List<string> list)
{
List<string> prefixes = null;
for (int len = 1; ; ++len)
{
var x = list.Where(s => s.Length >= len)
.GroupBy(c => c.Substring(0,len))
.Where(grp => grp.Count() > 1)
.Select(grp => grp.Key)
.ToList();
if (!x.Any())
{
break;
}
// Copy last list
prefixes = new List<string>(x);
}
return prefixes == null ? string.Empty : prefixes.First();
}
If there is more than one prefix with the same length, it arbitrarily returns the first one found. Also it's case-sensitive. Both these points can be addressed by the reader!
Upvotes: 1
Reputation: 19705
My approach would be, take the first string. Get letter by letter while all the other string got the same letter on the same index position and stop if there is no match. Remove the last character if it's a separator.
var str_array = new string[]{"h:/a/b/c",
"h:/a/b/d",
"h:/a/b/e",
"h:/a/c"};
var separator = '/';
// get longest common prefix (optinally use ToLowerInvariant)
var ret = str_array.Any()
? str_array.First().TakeWhile((s,i) =>
str_array.All(e =>
Char.ToLowerInvariant(s) == Char.ToLowerInvariant(e.Skip(i).Take(1).SingleOrDefault())))
: String.Empty;
// remove last character if it's a separator (optional)
if (ret.LastOrDefault() == separator)
ret = ret.Take(ret.Count() -1);
string prefix = new String(ret.ToArray());
Upvotes: 1
Reputation: 1351
Here is a custom implementation of the trie algorithm in c# (http://en.wikipedia.org/wiki/Trie). It is used to perform an indexed string via prefixes. This class has O(1) write and reads for leaf nodes. For prefix searches, the performance is O(log n), however the count of results for prefix is O(1).
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
public class StringIndex
{
private Dictionary<char, Item> _rootChars;
public StringIndex()
{
_rootChars = new Dictionary<char, Item>();
}
public void Add(string value, string data)
{
int level = 0;
Dictionary<char, Item> currentChars = _rootChars;
Item currentItem = null;
foreach (char c in value)
{
if (currentChars.ContainsKey(c))
{
currentItem = currentChars[c];
}
else
{
currentItem = new Item() { Level = level, Letter = c };
currentChars.Add(c, currentItem);
}
currentChars = currentItem.Items;
level++;
}
if (!currentItem.Values.Contains(data))
{
currentItem.Values.Add(data);
IncrementCount(value);
}
}
private void IncrementCount(string value)
{
Dictionary<char, Item> currentChars = _rootChars;
Item currentItem = null;
foreach (char c in value)
{
currentItem = currentChars[c];
currentItem.Total++;
currentChars = currentItem.Items;
}
}
public void Remove(string value, string data)
{
Dictionary<char, Item> currentChars = _rootChars;
Dictionary<char, Item> parentChars = null;
Item currentItem = null;
foreach (char c in value)
{
if (currentChars.ContainsKey(c))
{
currentItem = currentChars[c];
parentChars = currentChars;
currentChars = currentItem.Items;
}
else
{
return; // no matches found
}
}
if (currentItem.Values.Contains(data))
{
currentItem.Values.Remove(data);
DecrementCount(value);
if (currentItem.Total == 0)
{
parentChars.Remove(currentItem.Letter);
}
}
}
private void DecrementCount(string value)
{
Dictionary<char, Item> currentChars = _rootChars;
Item currentItem = null;
foreach (char c in value)
{
currentItem = currentChars[c];
currentItem.Total--;
currentChars = currentItem.Items;
}
}
public void Clear()
{
_rootChars.Clear();
}
public int GetValuesByPrefixCount(string prefix)
{
int valuescount = 0;
int level = 0;
Dictionary<char, Item> currentChars = _rootChars;
Item currentItem = null;
foreach (char c in prefix)
{
if (currentChars.ContainsKey(c))
{
currentItem = currentChars[c];
currentChars = currentItem.Items;
}
else
{
return valuescount; // no matches found
}
level++;
}
valuescount = currentItem.Total;
return valuescount;
}
public HashSet<string> GetValuesByPrefixFlattened(string prefix)
{
var results = GetValuesByPrefix(prefix);
return new HashSet<string>(results.SelectMany(x => x));
}
public List<HashSet<string>> GetValuesByPrefix(string prefix)
{
var values = new List<HashSet<string>>();
int level = 0;
Dictionary<char, Item> currentChars = _rootChars;
Item currentItem = null;
foreach (char c in prefix)
{
if (currentChars.ContainsKey(c))
{
currentItem = currentChars[c];
currentChars = currentItem.Items;
}
else
{
return values; // no matches found
}
level++;
}
ExtractValues(values, currentItem);
return values;
}
public void ExtractValues(List<HashSet<string>> values, Item item)
{
foreach (Item subitem in item.Items.Values)
{
ExtractValues(values, subitem);
}
values.Add(item.Values);
}
public class Item
{
public int Level { get; set; }
public char Letter { get; set; }
public int Total { get; set; }
public HashSet<string> Values { get; set; }
public Dictionary<char, Item> Items { get; set; }
public Item()
{
Values = new HashSet<string>();
Items = new Dictionary<char, Item>();
}
}
}
Here is the unit testing & example code for how to use this class.
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
[TestClass]
public class StringIndexTest
{
[TestMethod]
public void AddAndSearchValues()
{
var si = new StringIndex();
si.Add("abcdef", "1");
si.Add("abcdeff", "2");
si.Add("abcdeffg", "3");
si.Add("bcdef", "4");
si.Add("bcdefg", "5");
si.Add("cdefg", "6");
si.Add("cdefgh", "7");
var output = si.GetValuesByPrefixFlattened("abc");
Assert.IsTrue(output.Contains("1") && output.Contains("2") && output.Contains("3"));
}
[TestMethod]
public void RemoveAndSearch()
{
var si = new StringIndex();
si.Add("abcdef", "1");
si.Add("abcdeff", "2");
si.Add("abcdeffg", "3");
si.Add("bcdef", "4");
si.Add("bcdefg", "5");
si.Add("cdefg", "6");
si.Add("cdefgh", "7");
si.Remove("abcdef", "1");
var output = si.GetValuesByPrefixFlattened("abc");
Assert.IsTrue(!output.Contains("1") && output.Contains("2") && output.Contains("3"));
}
[TestMethod]
public void Clear()
{
var si = new StringIndex();
si.Add("abcdef", "1");
si.Add("abcdeff", "2");
si.Add("abcdeffg", "3");
si.Add("bcdef", "4");
si.Add("bcdefg", "5");
si.Add("cdefg", "6");
si.Add("cdefgh", "7");
si.Clear();
var output = si.GetValuesByPrefix("abc");
Assert.IsTrue(output.Count == 0);
}
[TestMethod]
public void AddAndSearchValuesCount()
{
var si = new StringIndex();
si.Add("abcdef", "1");
si.Add("abcdeff", "2");
si.Add("abcdeffg", "3");
si.Add("bcdef", "4");
si.Add("bcdefg", "5");
si.Add("cdefg", "6");
si.Add("cdefgh", "7");
si.Remove("cdefgh", "7");
var output1 = si.GetValuesByPrefixCount("abc");
var output2 = si.GetValuesByPrefixCount("b");
var output3 = si.GetValuesByPrefixCount("bc");
var output4 = si.GetValuesByPrefixCount("ca");
Assert.IsTrue(output1 == 3 && output2 == 2 && output3 == 2 && output4 == 0);
}
}
Any suggestions on how to improve this class are welcome :)
Upvotes: 2
Reputation: 24480
I wanted a common string prefix, except I wanted to include any character (like /) and I did not want something performant/fancy just something I can read with tests. So I have this: https://github.com/fschwiet/DreamNJasmine/commit/ad802611ceacc673f2d03c30f7c8199f552b586f
public class CommonStringPrefix
{
public static string Of(IEnumerable<string> strings)
{
var commonPrefix = strings.FirstOrDefault() ?? "";
foreach(var s in strings)
{
var potentialMatchLength = Math.Min(s.Length, commonPrefix.Length);
if (potentialMatchLength < commonPrefix.Length)
commonPrefix = commonPrefix.Substring(0, potentialMatchLength);
for(var i = 0; i < potentialMatchLength; i++)
{
if (s[i] != commonPrefix[i])
{
commonPrefix = commonPrefix.Substring(0, i);
break;
}
}
}
return commonPrefix;
}
}
Upvotes: 2
Reputation: 217313
string[] xs = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" };
string x = string.Join("/", xs.Select(s => s.Split('/').AsEnumerable())
.Transpose()
.TakeWhile(s => s.All(d => d == s.First()))
.Select(s => s.First()));
with
public static IEnumerable<IEnumerable<T>> Transpose<T>(
this IEnumerable<IEnumerable<T>> source)
{
var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
try
{
while (enumerators.All(e => e.MoveNext()))
{
yield return enumerators.Select(e => e.Current).ToArray();
}
}
finally
{
Array.ForEach(enumerators, e => e.Dispose());
}
}
Upvotes: 37
Reputation: 32936
Just loop round the characters of the shortest string and compare each character to the character in the same position in the other strings. Whilst they all match keep going. As soon as one doesn't match then the string up to the current position -1 is the answer.
Something like (pseudo code)
int count=0;
foreach(char c in shortestString)
{
foreach(string s in otherStrings)
{
if (s[count]!=c)
{
return shortestString.SubString(0,count-1); //need to check count is not 0
}
}
count+=1;
}
return shortestString;
Upvotes: 15
Reputation: 311536
This is the longest common substring problem (although it's a slightly specialized case since you appear to only care about the prefix). There's no library implementation of the algorithm in the .NET platform that you can call directly, but the article linked here is chock-full of steps on how you'd do it yourself.
Upvotes: 5