itthrill
itthrill

Reputation: 1376

adding a column to an existing data in shell

I have a csv with three headers. I want to add an extra header called "tag" to it.

$ cat TEST1/a.csv
    h1,h2,h3
    a,b,c
    d,e,f
$ awk '{print $0}' TEST1/a.csv
    h1,h2,h3
    a,b,c
    d,e,f
$ awk '{print $0, "tag"}' TEST1/a.csv
     tag2,h3
     tagc
     tagf

However as shown above the current approach is giving garbage value. How can I get output like below:-

h1,h2,h3,tag
a,b,c,TEST1/a.csv
d,e,f,TEST1/a.csv

It would be great to have tag column containing the filename.

Upvotes: 0

Views: 75

Answers (2)

ghoti
ghoti

Reputation: 46896

I'm not sure why you're getting the results you show in your third awk line, and I'm not entirely sure what you want as the last field that you add, because your "expected results" don't really match the code you've provided. If your goal is to add the word "tag" as a last field on every line, then the following might work...

awk -F, '{$(NF+1)="tag"} 1' OFS=, TEST1/a.csv

This has the following bits:

  • -F, sets the field separator to a comma, compatible with your CSV.
  • $(NR+1) adds a new field to the end of each record.
  • 1 is shorthand to "print the current record".
  • OFS=, sets the output field separator to a comma.

There are a few ways you could construct the same logic, and all will provide roughly the same results.

This sets input and output field separators in the BEGIN block, and uses the addition of the field as the condition that prints the line.

awk 'BEGIN{FS=OFS=","} $(NF+1)="tag"' TEST1/a.csv

This dispenses with the idea of records and just adds text to each line.

awk '{$0=$0 ",tag"} 1' TEST1/a.csv

etc.

In general, if you're dealing with input in fields, I recommend using awk in a way that understands those fields, just in case you find yourself needing to manipulate fields instead of the stream in the future. If you wanted a stream editor, you could use sed.

sed 's/$/,tag/' TEST1/a.csv

IF on the other hand you want to add the name of the file to the end of every line, and have the text tag only in the header, you might do something like this:

awk 'NR==1 {$(NF+1)="tag"} NR>1 {$(NF+1)=FILENAME} 1' FS=, OFS=, TEST1/a.csv

This will generate the results you've shown, with the filename in the final field. You can do all sorts of variations based on the shape of your data, of course. If you're dealing with multiple files, each of which has a header on the first line, you might want this instead:

awk 'NR==1 {$(NF+1)="tag";print} FNR==1 {next} NR>1 {$(NF+1)=FILENAME} 1' FS=, OFS=, file1.csv file2.csv ...

The difference here is that the header is modified on the first line and printed, and then subsequent first-lines-of-files are skipped entirely.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133780

By seeing OP's output I believe what OP needs is on 1st line it requires the tag string to be added to header and rest of the lines File's name with path should be added, if this is the case then try following. I have also taken care of control M \r characters from Input_file each line of it.

awk 'BEGIN{OFS=","} {gsub(/\r/,"")} FNR==1{print $0,"tag";next} {print $0,FILENAME}' TEST1/a.csv

Output will be as follows.

h1,h2,h3,tag
a,b,c,TEST1/a.csv
d,e,f,TEST1/a.csv


In case you want to remove control M characters first in Input_file and then run awk command then use following.

tr -d '\r' '' < Input_file > temp_file  &&  mv temp_file Input_file

Then run following awk command.

awk 'BEGIN{OFS=","}FNR==1{print $0,"tag";next} {print $0,FILENAME}' TEST1/a.csv

Upvotes: 1

Related Questions