Reputation: 1429
My csv file has multiple rows of data and I want to split it into multiple files based on one attribute.
beeline -u jdbc:hive2:<MYHOST> -n <USER> -p <PASSWORD> --silent=true --outputformat=csv2 -f <SQL FILE> > result_+%Y%m%d_%H%M%S.csv
SQL code with ORDER BY ID
is triggered from beeline which creates single CSV.
cat sql.csv "attr;attr;ID;attr" "data;data;XXXX;date" "data;data;XXXX;date" "data;data;YYYYY;date" "data;data;YYYYY;date" "data;data;BBBBB;date" "data;data;BBBBB;date"
Desired result is to split once new ID
is recognised and use that ID
in filename.
file_1_ID_XXXX_+%Y%m%d_%H%M%S
:
attr attr ID attr data data XXXX date data data XXXX date
file_2_ID_YYYYY_+%Y%m%d_%H%M%S
:
attr attr ID attr data data YYYYY date data data YYYYY date
Upvotes: 0
Views: 1997
Reputation: 84531
If I understand your question, you can take the csv file produced by sql and then split that into the 3 files you show simply by using a few variables, string concatenation and then by redirecting to the output files, e.g.
awk -v field=a -v n=1 -v dt=$(date '+%Y%m%d_%H%M%S') '
FNR == 1 {hdg=$0; next}
a != $3 {a = $3; name="file_"n"_ID_"a"_"dt; n++; print hdg > name}
{print $0 > name}
' sqldata
Example Input File
Where your sqldata
file contains:
$ cat sqldata
attr attr ID attr
data data XXXX date
data data XXXX date
data data YYYYY date
data data YYYYY date
data data BBBBB date
data data BBBBB date
Example Use/Output Files
Simply copying and middle-mouse pasting awk script into the terminal with the correct filename to read would produce the following three output files:
$ cat file_1_ID_XXXX_20190805_033514
attr attr ID attr
data data XXXX date
data data XXXX date
$ cat file_2_ID_YYYYY_20190805_033514
attr attr ID attr
data data YYYYY date
data data YYYYY date
$ cat file_3_ID_BBBBB_20190805_033514
attr attr ID attr
data data BBBBB date
data data BBBBB date
Look things over and let me know if this is what you intended. If not, let me know and I'm happy to help further.
Upvotes: 2