Reputation: 421
I was working on finding out the Common string part in the String list. If we take a sample data set
private readonly List<string> Xpath = new List<string>()
{
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(1)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(2)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(3)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(4)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(5)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(6)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(7)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(8)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(9)>H2:nth-of-type(1)"
};
From this, I want to find out to which children these are similar. data is an Xpath list. Programmatically I should be able to tell
Expected output:
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV
In order to get this What I did was like this. I separate each item by >
and then create a list of items for each dataset originally.
Then using this find out what are the unique items
private IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
{
hs.IntersectWith(lists[i]);
}
return hs;
}
Able to find out the unique values and create a dataset again. But what happened is if this contains Ex:- Div in two places and it also in every originally dataset even then this method will pick up only one Div.
From then I would get something like this:
BODY>MAIN:nth-of-type(1)>DIV>SECTION
But I need this
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of- type(3)>DIV>ARTICLE>DIV>DIV>DIV
Upvotes: 3
Views: 130
Reputation: 1189
Parsing the string by the seperator >
is a good idea. Instead of then creating a list of unique items you should create a list of all items contained in the string which would result in
{
"BODY",
"MAIN:nth-of-type(1)",
"DIV",
"SECTTION",
"DIV",
...
}
for the first entry of your XPath list.
This way you create a List<List<string>>
containing every element of each entry of your XPath list. You then can compare all first elements of the inner lists. If they are equal save that elements value to you output and proceed with all second elements and so on until you find an element that is not equal in all outer lists.
Edit:
After seperating your list by the >
seperator this could look something like this:
List<List<string>> XPathElementsLists;
List<string> resultElements = new List<string>();
string result;
XPathElementsLists = ParseElementsFormXPath(XPath);
for (int i = 0; i < XPathElementsLists[0].Count; i++)
{
bool isEqual = true;
string compareElemment = XPathElementsLists[0][i];
foreach (List<string> element in XPathElementsLists)
{
if (!String.Equals(compareElemment, element))
{
isEqual = false;
break;
}
}
if (!isEqual)
{
break;
}
resultElements.Add(compareElemment);
}
result = String.Join(">", resultElements.ToArray());
Upvotes: 1
Reputation: 22819
Disclaimer: This is not the most performant solution but it works :)
>
characterchar separator = '>';
IEnumerable<string> firstPathChunks = Xpath[0].Split(separator);
var chunks = Xpath.Select(path => path.Split(separator).ToList()).ToArray();
firstPathChunks
chunks
sb
void Process(StringBuilder sb)
{
foreach (var pathChunk in firstPathChunks)
{
foreach (var chunk in chunks)
{
if (chunk[0] != pathChunk)
{
return;
}
chunk.RemoveAt(0);
}
sb.Append(pathChunk);
sb.Append(separator);
}
}
Sample usage
var sb = new StringBuilder();
Process(sb);
Console.WriteLine(sb.ToString());
Output
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>
Upvotes: 1