Ahk86
Ahk86

Reputation: 177

Bash Extract specific columns from TSV file to new file and add extra column with header of extracted column

I have a file like this

name  AD01392 AV93289 AG98023 A723928
xyb323 232    3232    8992    3278
xyb897 343    434     344     3434 
xyvg87 439    3434    2434    43434

I need to extract the information in this file to 4 separate files that extract column 1 and iterate over columns 2,3 and 4 then add another column with the header of the extracted column Like so

File 1

name  AD01392 
xyb323 232    AD01392
xyb897 343    AD01392
xyvg87 439    AD01392

File 2

name  AV93289 
xyb323 3232   AV93289
xyb897 434    AV93289  
xyvg87 3434   AV93289 

and so on for the other 2 files.

I can extract the columns with

awk '{print $1,$2}' file.tsv > file1.tsv
awk '{print $1,$3}' file.tsv > file2.tsv
awk '{print $1,$4}' file.tsv > file3.tsv
awk '{print $1,$5}' file.tsv > file5.tsv

but not sure how I can extract the first row from each column and add that as a column to all rows.

Thank you a lot for your help and contributions.

Upvotes: 2

Views: 2181

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133458

Could you please try following awk and let me know if this helps you.

awk 'FNR==1{a[1]=$1;for(i=2;i<=NF;i++){a[i]=$i;print a[1],a[i] > "File"++q;}next} {for(j=2;j<=NF;j++){print $1,$j,a[j] > "File"++o};o=""}'  Input_file

Adding a non-one liner form of solution too now.

awk '
FNR==1{
  a[1]=$1;
  for(i=2;i<=NF;i++){
    a[i]=$i;
    print a[1],a[i] > "File"++q}
  next}
{
  for(j=2;j<=NF;j++){
    print $1,$j,a[j] > "File"++o}
  o=""
}
'  Input_file

Upvotes: 2

Allan
Allan

Reputation: 12438

You can use the following:

awk 'NR==1{save=$2;print $1,$2}NR>1{print $1,$2,save}' file.tsv > file1.tsv
awk 'NR==1{save=$3;print $1,$3}NR>1{print $1,$3,save}' file.tsv > file2.tsv
awk 'NR==1{save=$4;print $1,$4}NR>1{print $1,$4,save}' file.tsv > file3.tsv
awk 'NR==1{save=$5;print $1,$5}NR>1{print $1,$5,save}' file.tsv > file5.tsv

to print the last column where you just save the content of $2, $3, $4, $5 in a tmp variable that you will reuse during the whole file process. By default the output field separator is space but you can change it by overwriting the content of the variable OFS depending on your need, so if you want for example to have \t as separator

awk 'NR==1{OFS="\t";save=$2;print $1,$2}NR>1{print $1,$2,save}' file.tsv > file1.tsv
awk 'NR==1{OFS="\t";save=$3;print $1,$3}NR>1{print $1,$3,save}' file.tsv > file2.tsv
awk 'NR==1{OFS="\t";save=$4;print $1,$4}NR>1{print $1,$4,save}' file.tsv > file3.tsv
awk 'NR==1{OFS="\t";save=$5;print $1,$5}NR>1{print $1,$5,save}' file.tsv > file5.tsv

IMPROVEMENT:

I would also recommend to use the following awk command in order to process the file one and only one time (imagine you have big file that take 2 hours to be processed)

awk 'BEGIN{OFS="\t"}NR==1{save2=$2;save3=$3;save4=$4;save5=$5;print $1,$2> "file1.tsv";print $1,$3 > "file2.tsv"; print $1,$4 > "file3.tsv"; print $1,$5 > "file5.tsv";}NR>1{print $1,$2,save2 > "file1.tsv"; print $1,$3,save3 > "file2.tsv"; print $1,$4,save4 > "file3.tsv"; print $1,$5,save5 > 
"file5.tsv";}' file.tsv

where the awk code is:

BEGIN{
    OFS="\t"
}

NR==1{
    save2=$2;
    save3=$3;
    save4=$4;
    save5=$5;
    print $1,$2> "file1.tsv";
    print $1,$3 > "file2.tsv";
    print $1,$4 > "file3.tsv";
    print $1,$5 > "file5.tsv";
}

NR>1{
    print $1,$2,save2 > "file1.tsv";
    print $1,$3,save3 > "file2.tsv";
    print $1,$4,save4 > "file3.tsv";
    print $1,$5,save5 > "file5.tsv";
}

TESTED:

$ more file?.tsv
::::::::::::::
file1.tsv
::::::::::::::
name    AD01392
xyb323  232     AD01392
xyb897  343     AD01392
xyvg87  439     AD01392
::::::::::::::
file2.tsv
::::::::::::::
name    AV93289
xyb323  3232    AV93289
xyb897  434     AV93289
xyvg87  3434    AV93289
::::::::::::::
file3.tsv
::::::::::::::
name    AG98023
xyb323  8992    AG98023
xyb897  344     AG98023
xyvg87  2434    AG98023
::::::::::::::
file5.tsv
::::::::::::::
name    A723928
xyb323  3278    A723928
xyb897  3434    A723928
xyvg87  43434   A723928

Upvotes: 2

Related Questions