Reputation: 9337
I have a long list of csv files that use for data mining (new files are coming in daily). Each file name contains a date when the file was created. I need to parse the date out of the filename and add it as a new column to each row in the file (changing the header line would also be nice).
so if I have a file named cx3-2016-04-01.csv
with the following content:
country,os,os_ver,oem,model
CN,A,6.0,Xiaomi,MI NOTE
US,A,6.0,LGE,LGLS7700
CN,A,6.0,Xiaomi,MI 4LTE
US,A,6.0,LGE,LGUS991
US,A,6.0,LGE,LGUS991
I want the output to look like:
date,country,os,os_ver,oem,model
2016-04-01,CN,A,6.0,Xiaomi,MI NOTE
2016-04-01,US,A,6.0,LGE,LGLS7700
2016-04-01,CN,A,6.0,Xiaomi,MI 4LTE
2016-04-01,US,A,6.0,LGE,LGUS991
2016-04-01,US,A,6.0,LGE,LGUS991
Can and how do I do this using standard linux command line tools in a single command or command chain (but not with a script)?
Upvotes: 0
Views: 2261
Reputation: 136
Try this awk
Run this on the path where the file is stored Or provide the filename with the path. In the below I just gave the file name ( cx3-2016-04-01.csv ) towards the end.
awk ' { x=1 ; if ( x == NR ) { print "date,country,os,os_ver,oem,model" } else { gsub("cx3-","",FILENAME); gsub(".csv","",FILENAME); print FILENAME","$0 } } ' cx3-2016-04-01.csv
How it works
First Line is Hard Coded for the Header ( date,country,os,os_ver,oem,model )
For Every other line, Filename's "cx3-" and ".csv" is removed and added to the start of the line with a , ( comma ).
Here is the output the above command produces.
date,country,os,os_ver,oem,model
2016-04-01,CN,A,6.0,Xiaomi,MI NOTE
2016-04-01,US,A,6.0,LGE,LGLS7700
2016-04-01,CN,A,6.0,Xiaomi,MI 4LTE
2016-04-01,US,A,6.0,LGE,LGUS991
2016-04-01,US,A,6.0,LGE,LGUS991
Upvotes: 1