719016
719016

Reputation: 10431

how to pipe contents of large tar.gz file to STDOUT?

I have a large.tar.gz file containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.

I want to avoid having to extract the contents of large large.tar.gz into a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gz straight to STDOUT so that I can grep/parse out the information I want from them?

I presume there must be some magic like:

tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -

Any ideas?

Upvotes: 21

Views: 23460

Answers (1)

Cyrus
Cyrus

Reputation: 88563

Use this with GNU tar to extract a tgz to stdout:

tar -xOzf large.tar.gz --wildcards '*.html' | grep ...

-O, --to-stdout: extract files to standard output

Upvotes: 49

Related Questions