thereceptionist
thereceptionist

Reputation: 77

Extracting specific folders in multiple tar.gz files recursively

I am working with the open Synthetic patient and population health data, Synthea.

The dataset comes in a 21gb tar.gz that extracts into a set of tar.gz files that represent the data in a number of data formats.

The extracted source folder structure looks like this:

|-- output_11_20170528T113605.tar.gz
|-- output_1_20170524T232103.tar.gz
|-- output_12_20170528T195303.tar.gz
|-- output_2_20170525T073836.tar.gz
|-- output_3_20170525T161555.tar.gz
|-- output_4_20170526T004637.tar.gz
|-- output_5_20170526T091439.tar.gz
|-- output_6_20170526T173337.tar.gz
|-- output_7_20170527T015508.tar.gz
|-- output_8_20170527T102552.tar.gz
|-- output_9_20170527T185007.tar.gz

I have tried to extract only the CSV files using the below command, which works well for a single file:

tar -zxvf output_1_20170525T073836.tar.gz "output_1*csv*" -C ../synthea_output_folder

It would be neat to build a shell script that can iterate over these files and extract the CSV folders from each tar.gz files so that they appear in the synthea_output_folder like so:

|-- output_11/csv
|-- output_1/csv
|-- output_12/csv
|-- output_2/csv
|-- output_3/csv
|-- output_4/csv
|-- output_5/csv
|-- output_6/csv
|-- output_7/csv
|-- output_8/csv
|-- output_9/csv

I found a shell script to untar recursively but I don't know how to filter out only the CSV folder from each file:

for f in *.tar.gz; do tar -xzvf "$f"; done

Upvotes: -1

Views: 934

Answers (1)

thereceptionist
thereceptionist

Reputation: 77

Possible solution

After tinkering around with the above shell code I managed to extract only the csv folders by adding the csv wildcard command:

for f in *.tar.gz; do tar -xzvf "$f" "*csv*" -C ../synthea_output; done

The output now looks like this:

|-- output_1
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_10
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_11
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_12
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_2
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_3
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_4
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_5
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_6
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_7
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
|-- output_8
|   `-- csv
|       |-- allergies.csv
|       |-- careplans.csv
|       |-- conditions.csv
|       |-- encounters.csv
|       |-- immunizations.csv
|       |-- medications.csv
|       |-- observations.csv
|       |-- patients.csv
|       `-- procedures.csv
`-- output_9
    `-- csv
        |-- allergies.csv
        |-- careplans.csv
        |-- conditions.csv
        |-- encounters.csv
        |-- immunizations.csv
        |-- medications.csv
        |-- observations.csv
        |-- patients.csv
        `-- procedures.csv

Upvotes: 0

Related Questions