Reputation: 5148
I have a very large CSV file, input.csv
, that looks like this:
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
I am trying to save the contents (all the columns) of this file based on the URL in the first column into separate files.
So the output for the above snippet should be two files:
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
and
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
To split this file based on the first column, I am using awk thus:
awk -F, '{print >> ($1".csv")}' input.csv
However, I am unable to save to any file based on the URL field because of this error:
awk: cmd. line:1: (FILENAME=input.csv FNR=1) fatal: can't redirect to ` https://www.youtube.com/watch?v=9t5V_sMVN5I.csv' (No such file or directory)
Saving a file using the URL-style string as filename is apparently causing some error. The many '/' must be causing the problem in the file path.
Is there any way to save the contents based on column 1 ($1) using awk, but such the output files are named differently, perhaps following a sequence like numbering 1..N? The other option is to replace every URL with some unique identifier and then split on that -- however I have not yet been able to script this up.
Any help would be appreciated!
Upvotes: 1
Views: 469
Reputation: 23667
Since the first column has regular format with string after =
serving as unique identifier, we can use that
awk -F, '{split($1,a,"="); print > (a[2]".csv")}' input.csv
$ cat b7kKTSVbfdA.csv
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.56, 0.98, 87
https://www.youtube.com/watch?v=b7kKTSVbfdA, 0.66, 0.7, 89
$ cat 9t5V_sMVN5I.csv
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.56, 0.98, 87
https://www.youtube.com/watch?v=9t5V_sMVN5I, 0.66, 0.7, 89
Reference:
Upvotes: 1
Reputation: 141
because your filename contains '/' character, you can use this method blow:
awk -F, '{filename=$1;sub(".*=","",filename);print >> (filename".csv")}' input.csv
Upvotes: 0