Reputation: 77
I am working with the open Synthetic patient and population health data, Synthea.
The dataset comes in a 21gb tar.gz that extracts into a set of tar.gz files that represent the data in a number of data formats.
The extracted source folder structure looks like this:
|-- output_11_20170528T113605.tar.gz
|-- output_1_20170524T232103.tar.gz
|-- output_12_20170528T195303.tar.gz
|-- output_2_20170525T073836.tar.gz
|-- output_3_20170525T161555.tar.gz
|-- output_4_20170526T004637.tar.gz
|-- output_5_20170526T091439.tar.gz
|-- output_6_20170526T173337.tar.gz
|-- output_7_20170527T015508.tar.gz
|-- output_8_20170527T102552.tar.gz
|-- output_9_20170527T185007.tar.gz
I have tried to extract only the CSV files using the below command, which works well for a single file:
tar -zxvf output_1_20170525T073836.tar.gz "output_1*csv*" -C ../synthea_output_folder
It would be neat to build a shell script that can iterate over these files and extract the CSV folders from each tar.gz files so that they appear in the synthea_output_folder like so:
|-- output_11/csv
|-- output_1/csv
|-- output_12/csv
|-- output_2/csv
|-- output_3/csv
|-- output_4/csv
|-- output_5/csv
|-- output_6/csv
|-- output_7/csv
|-- output_8/csv
|-- output_9/csv
I found a shell script to untar recursively but I don't know how to filter out only the CSV folder from each file:
for f in *.tar.gz; do tar -xzvf "$f"; done
Upvotes: -1
Views: 934
Reputation: 77
Possible solution
After tinkering around with the above shell code I managed to extract only the csv folders by adding the csv wildcard command:
for f in *.tar.gz; do tar -xzvf "$f" "*csv*" -C ../synthea_output; done
The output now looks like this:
|-- output_1
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_10
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_11
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_12
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_2
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_3
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_4
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_5
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_6
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_7
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_8
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
`-- output_9
`-- csv
|-- allergies.csv
|-- careplans.csv
|-- conditions.csv
|-- encounters.csv
|-- immunizations.csv
|-- medications.csv
|-- observations.csv
|-- patients.csv
`-- procedures.csv
Upvotes: 0