Reputation: 367
Its a silly question, but I can't find the answer yet.
In my XML i've got the below lines:
<BLAH><BLAH><BLAH>
<ABC>123456</ABC>
<ABC>123456</ABC>
<ABC>adfadfaf</ABC>
<ABC>gdsgdhghd</ABC>
</BLAH></BLAH></BLAH>
Distinct count of patterns in <ABC>*</ABC>
is 3.
Basically I want to count unique values between <ABC>
and </ABC>
3 when I do a find & count in notepad++ or in Linux grep command.
Upvotes: 0
Views: 876
Reputation: 9875
Assuming that the input has a format as shown in the example, you can use the code below.
That means every combination of corresponding <ABC>
and </ABC>
tags must be in one line with a text-only value in between.
grep -o '<ABC>[^<]*</ABC>' input.xml |sort -u|wc -l
The command may not work if the input is formatted in other ways or if the value between <ABC>
and </ABC>
contains other tags.
With the example input from the question it will print
3
It even works when there is more than one pair of <ABC>
and </ABC>
in a line.
With
<BLAH><BLAH><BLAH>
<ABC>123456</ABC>foo<ABC>1234567</ABC>
<ABC>123456</ABC>
<ABC>adfadfaf</ABC>
<ABC>gdsgdhghd</ABC>
</BLAH></BLAH></BLAH>
it prints
4
Upvotes: 1
Reputation: 91488
This will only work if the <ABC>...</ABC>
is ordered.
(<ABC>.+?</ABC>)\R(?!\1)
. matches newline
Screenshot:
Upvotes: 1