Parse thousands of xml files with awk

Question

I have several thousand files and they each contain only one very long line.

I want to convert them all to one file with one entry per line split at the ID fields and I have this working with a few files but it takes too long on hundreds of files and seems to crash on thousands of files. Looking for a faster way that is unlimited.

(find -type f -name '*.xml' -exec cat {} \;) | awk '{gsub("ID","
ID");printf"%s",$0}'

I have also tried this..

(find -type f -name '*.xml' -exec cat {} \;) | sed 's/ID/
ID/g'

I think the problem is trying to use replacement instead of insertion or it is using too much memory.

Thanks

Birei · Accepted Answer

I can't test it with thousand of files, but instead of cat all data into memory before processing them with awk, try to run awk with some of those files at a time, like:

find . -type f -name "*.xml*" -exec awk '{gsub("ID","
ID");printf"%s",$0}' {} +

Parse thousands of xml files with awk

Answers (2)

Related Questions