Naga
Naga

Reputation: 367

Count Distinct XML Pattern in Notepad++ or in Linux Grep

Its a silly question, but I can't find the answer yet.

In my XML i've got the below lines:

<BLAH><BLAH><BLAH>
<ABC>123456</ABC>
<ABC>123456</ABC>
<ABC>adfadfaf</ABC>
<ABC>gdsgdhghd</ABC>
</BLAH></BLAH></BLAH>

Distinct count of patterns in <ABC>*</ABC> is 3.

Basically I want to count unique values between <ABC> and </ABC> 3 when I do a find & count in notepad++ or in Linux grep command.

Upvotes: 0

Views: 876

Answers (2)

Bodo
Bodo

Reputation: 9875

Assuming that the input has a format as shown in the example, you can use the code below.

That means every combination of corresponding <ABC> and </ABC> tags must be in one line with a text-only value in between.

grep -o '<ABC>[^<]*</ABC>' input.xml |sort -u|wc -l

The command may not work if the input is formatted in other ways or if the value between <ABC> and </ABC> contains other tags.

With the example input from the question it will print

3

It even works when there is more than one pair of <ABC> and </ABC> in a line.

With

<BLAH><BLAH><BLAH>
<ABC>123456</ABC>foo<ABC>1234567</ABC>
<ABC>123456</ABC>
<ABC>adfadfaf</ABC>
<ABC>gdsgdhghd</ABC>
</BLAH></BLAH></BLAH>

it prints

4

Upvotes: 1

Toto
Toto

Reputation: 91488

This will only work if the <ABC>...</ABC> is ordered.

  • Ctrl+F
  • Find what: (<ABC>.+?</ABC>)\R(?!\1)
  • CHECK Match case
  • CHECK Wrap around
  • CHECK Regular expression
  • UNCHECK . matches newline
  • Count

Screenshot:

enter image description here

Upvotes: 1

Related Questions