Reputation: 19
I have a file (XML) and I need to count the number of characters between a pattern (tag) and it pattern is repeated in the file.
The pattern is:
<controlfield tag="001">
Example XML file content:
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">xxx</subfield>
<subfield code="x">sdf</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">fff</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">asdfaf</subfield>
<subfield code="x">fdfdf</subfield>
<subfield code="x">dfdfdf</subfield>
</datafield>
<controlfield tag="001">000000355</controlfield>
<datafield tag="909" ind1=" " ind2=" ">
<subfield code="a">AGR01</subfield>
<subfield code="b">ph</subfield>
<subfield code="c">AGRP</subfield>
</datafield>
<datafield tag="910" ind1=" " ind2=" ">
<subfield code="a">AGR</subfield>
</datafield>
<controlfield tag="001">000000358</controlfield>
<datafield tag="590" ind1=" " ind2=" ">
<subfield code="a">19. dfsdfs em 2015</subfield>
<subfield code="w">CECLI</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">Topografia</subfield>
</datafield>
<controlfield tag="001">000000365</controlfield>
I read https://unix.stackexchange.com/questions/295332/i-need-the-counts-of-lines-between-two-matching-patterns and try:
sed -n '/tag="001"/,/tag="001"/p' file.xml | wc -l
But only one counter was printed.
I need a counter for each pattern occurrence, in the above example I need 3 counters:
number of characters before
<controlfield tag="001">000000355</controlfield>
number of characters between
<controlfield tag="001">000000355</controlfield>
and
<controlfield tag="001">000000358</controlfield>
number of characters between
<controlfield tag="001">000000358</controlfield>
and
<controlfield tag="001">000000365</controlfield>
Can you help me?
Upvotes: 0
Views: 44
Reputation: 67507
with GNU awk
$ awk -v RS="<controlfield tag=\"001\">[0-9]+</controlfield>" '{print length()}' file
394
253
239
1
the last 1 is for the last line feed. You may want to remove the line feeds before the length is calculated.
Upvotes: 2