Reputation: 177
I have a file like this
name AD01392 AV93289 AG98023 A723928
xyb323 232 3232 8992 3278
xyb897 343 434 344 3434
xyvg87 439 3434 2434 43434
I need to extract the information in this file to 4 separate files that extract column 1 and iterate over columns 2,3 and 4 then add another column with the header of the extracted column Like so
File 1
name AD01392
xyb323 232 AD01392
xyb897 343 AD01392
xyvg87 439 AD01392
File 2
name AV93289
xyb323 3232 AV93289
xyb897 434 AV93289
xyvg87 3434 AV93289
and so on for the other 2 files.
I can extract the columns with
awk '{print $1,$2}' file.tsv > file1.tsv
awk '{print $1,$3}' file.tsv > file2.tsv
awk '{print $1,$4}' file.tsv > file3.tsv
awk '{print $1,$5}' file.tsv > file5.tsv
but not sure how I can extract the first row from each column and add that as a column to all rows.
Thank you a lot for your help and contributions.
Upvotes: 2
Views: 2181
Reputation: 133458
Could you please try following awk
and let me know if this helps you.
awk 'FNR==1{a[1]=$1;for(i=2;i<=NF;i++){a[i]=$i;print a[1],a[i] > "File"++q;}next} {for(j=2;j<=NF;j++){print $1,$j,a[j] > "File"++o};o=""}' Input_file
Adding a non-one liner form of solution too now.
awk '
FNR==1{
a[1]=$1;
for(i=2;i<=NF;i++){
a[i]=$i;
print a[1],a[i] > "File"++q}
next}
{
for(j=2;j<=NF;j++){
print $1,$j,a[j] > "File"++o}
o=""
}
' Input_file
Upvotes: 2
Reputation: 12438
You can use the following:
awk 'NR==1{save=$2;print $1,$2}NR>1{print $1,$2,save}' file.tsv > file1.tsv
awk 'NR==1{save=$3;print $1,$3}NR>1{print $1,$3,save}' file.tsv > file2.tsv
awk 'NR==1{save=$4;print $1,$4}NR>1{print $1,$4,save}' file.tsv > file3.tsv
awk 'NR==1{save=$5;print $1,$5}NR>1{print $1,$5,save}' file.tsv > file5.tsv
to print the last column where you just save the content of $2
, $3
, $4
, $5
in a tmp variable that you will reuse during the whole file process. By default the output field separator is space but you can change it by overwriting the content of the variable OFS
depending on your need, so if you want for example to have \t
as separator
awk 'NR==1{OFS="\t";save=$2;print $1,$2}NR>1{print $1,$2,save}' file.tsv > file1.tsv
awk 'NR==1{OFS="\t";save=$3;print $1,$3}NR>1{print $1,$3,save}' file.tsv > file2.tsv
awk 'NR==1{OFS="\t";save=$4;print $1,$4}NR>1{print $1,$4,save}' file.tsv > file3.tsv
awk 'NR==1{OFS="\t";save=$5;print $1,$5}NR>1{print $1,$5,save}' file.tsv > file5.tsv
IMPROVEMENT:
I would also recommend to use the following awk
command in order to process the file one and only one time (imagine you have big file that take 2 hours to be processed)
awk 'BEGIN{OFS="\t"}NR==1{save2=$2;save3=$3;save4=$4;save5=$5;print $1,$2> "file1.tsv";print $1,$3 > "file2.tsv"; print $1,$4 > "file3.tsv"; print $1,$5 > "file5.tsv";}NR>1{print $1,$2,save2 > "file1.tsv"; print $1,$3,save3 > "file2.tsv"; print $1,$4,save4 > "file3.tsv"; print $1,$5,save5 >
"file5.tsv";}' file.tsv
where the awk
code is:
BEGIN{
OFS="\t"
}
NR==1{
save2=$2;
save3=$3;
save4=$4;
save5=$5;
print $1,$2> "file1.tsv";
print $1,$3 > "file2.tsv";
print $1,$4 > "file3.tsv";
print $1,$5 > "file5.tsv";
}
NR>1{
print $1,$2,save2 > "file1.tsv";
print $1,$3,save3 > "file2.tsv";
print $1,$4,save4 > "file3.tsv";
print $1,$5,save5 > "file5.tsv";
}
TESTED:
$ more file?.tsv
::::::::::::::
file1.tsv
::::::::::::::
name AD01392
xyb323 232 AD01392
xyb897 343 AD01392
xyvg87 439 AD01392
::::::::::::::
file2.tsv
::::::::::::::
name AV93289
xyb323 3232 AV93289
xyb897 434 AV93289
xyvg87 3434 AV93289
::::::::::::::
file3.tsv
::::::::::::::
name AG98023
xyb323 8992 AG98023
xyb897 344 AG98023
xyvg87 2434 AG98023
::::::::::::::
file5.tsv
::::::::::::::
name A723928
xyb323 3278 A723928
xyb897 3434 A723928
xyvg87 43434 A723928
Upvotes: 2