Reputation: 147
I have several thousand files and they each contain only one very long line.
I want to convert them all to one file with one entry per line split at the ID fields and I have this working with a few files but it takes too long on hundreds of files and seems to crash on thousands of files. Looking for a faster way that is unlimited.
(find -type f -name '*.xml' -exec cat {} \;) | awk '{gsub("ID","\nID");printf"%s",$0}'
I have also tried this..
(find -type f -name '*.xml' -exec cat {} \;) | sed 's/ID/\nID/g'
I think the problem is trying to use replacement instead of insertion or it is using too much memory.
Thanks
Upvotes: 1
Views: 687
Reputation: 36272
I can't test it with thousand of files, but instead of cat
all data into memory before processing them with awk
, try to run awk
with some of those files at a time, like:
find . -type f -name "*.xml*" -exec awk '{gsub("ID","\nID");printf"%s",$0}' {} +
Upvotes: 2
Reputation: 98078
Upvotes: 1