Reputation: 1376
I have a csv with three headers. I want to add an extra header called "tag" to it.
$ cat TEST1/a.csv
h1,h2,h3
a,b,c
d,e,f
$ awk '{print $0}' TEST1/a.csv
h1,h2,h3
a,b,c
d,e,f
$ awk '{print $0, "tag"}' TEST1/a.csv
tag2,h3
tagc
tagf
However as shown above the current approach is giving garbage value. How can I get output like below:-
h1,h2,h3,tag
a,b,c,TEST1/a.csv
d,e,f,TEST1/a.csv
It would be great to have tag column containing the filename.
Upvotes: 0
Views: 75
Reputation: 46896
I'm not sure why you're getting the results you show in your third awk line, and I'm not entirely sure what you want as the last field that you add, because your "expected results" don't really match the code you've provided. If your goal is to add the word "tag" as a last field on every line, then the following might work...
awk -F, '{$(NF+1)="tag"} 1' OFS=, TEST1/a.csv
This has the following bits:
-F,
sets the field separator to a comma, compatible with your CSV.$(NR+1)
adds a new field to the end of each record.1
is shorthand to "print the current record".OFS=,
sets the output field separator to a comma.There are a few ways you could construct the same logic, and all will provide roughly the same results.
This sets input and output field separators in the BEGIN block, and uses the addition of the field as the condition that prints the line.
awk 'BEGIN{FS=OFS=","} $(NF+1)="tag"' TEST1/a.csv
This dispenses with the idea of records and just adds text to each line.
awk '{$0=$0 ",tag"} 1' TEST1/a.csv
etc.
In general, if you're dealing with input in fields, I recommend using awk in a way that understands those fields, just in case you find yourself needing to manipulate fields instead of the stream in the future. If you wanted a stream editor, you could use sed
.
sed 's/$/,tag/' TEST1/a.csv
IF on the other hand you want to add the name of the file to the end of every line, and have the text tag
only in the header, you might do something like this:
awk 'NR==1 {$(NF+1)="tag"} NR>1 {$(NF+1)=FILENAME} 1' FS=, OFS=, TEST1/a.csv
This will generate the results you've shown, with the filename in the final field. You can do all sorts of variations based on the shape of your data, of course. If you're dealing with multiple files, each of which has a header on the first line, you might want this instead:
awk 'NR==1 {$(NF+1)="tag";print} FNR==1 {next} NR>1 {$(NF+1)=FILENAME} 1' FS=, OFS=, file1.csv file2.csv ...
The difference here is that the header is modified on the first line and printed, and then subsequent first-lines-of-files are skipped entirely.
Upvotes: 1
Reputation: 133780
By seeing OP's output I believe what OP needs is on 1st line it requires the tag
string to be added to header and rest of the lines File's name with path should be added, if this is the case then try following. I have also taken care of control M \r
characters from Input_file each line of it.
awk 'BEGIN{OFS=","} {gsub(/\r/,"")} FNR==1{print $0,"tag";next} {print $0,FILENAME}' TEST1/a.csv
Output will be as follows.
h1,h2,h3,tag
a,b,c,TEST1/a.csv
d,e,f,TEST1/a.csv
In case you want to remove control M characters first in Input_file and then run awk
command then use following.
tr -d '\r' '' < Input_file > temp_file && mv temp_file Input_file
Then run following awk
command.
awk 'BEGIN{OFS=","}FNR==1{print $0,"tag";next} {print $0,FILENAME}' TEST1/a.csv
Upvotes: 1