Reputation: 909
I have a tab delimited file which have 2 columns as:
new.txt
1.01 yes
2.00 no
0.93 no
1.2223 yes
1.7211 no
I want to modify the contents of it as if there are two categories as:
new_categorized.txt
yes no
1.01 2.00
1.2223 0.93
1.7211
I have found a similar question with an answer in R (here) ,however I need to do it with bash or awk.. I would appreciate your help.
Upvotes: 1
Views: 206
Reputation: 92854
GNU awk
solution:
awk '{ a[$2][($2=="yes"? ++y : ++n)]=$1 }
END{
max=(y > n? y:n);
print "yes","no";
for(i=1; i<=max; i++) print a["yes"][i], a["no"][i]
}' OFS='\t' file | column -tn
The output:
yes no
1.01 2.00
1.2223 0.93
1.7211
Upvotes: 2
Reputation: 203645
$ cat tst.awk
BEGIN { FS=OFS="\t" }
!($2 in label2colNr) {
label2colNr[$2] = ++numCols
colNr2label[numCols] = $2
}
{
colNr = label2colNr[$2]
val[++numRows[colNr],colNr] = $1
maxRows = (numRows[colNr] > maxRows ? numRows[colNr] : maxRows)
}
END {
for (colNr=1; colNr <= numCols; colNr++) {
printf "%s%s", colNr2label[colNr], (colNr<numCols ? OFS : ORS)
}
for (rowNr=1; rowNr <= maxRows; rowNr++) {
for (colNr=1; colNr <= numCols; colNr++) {
printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
yes no
1.01 2.00
1.2223 0.93
1.7211
The above will work with any awk in any shell on any UNIX system no matter how many categories you have in the 2nd field and no matter what their values are.
Upvotes: 3
Reputation: 88646
With bash, GNU grep and paste:
echo -e "yes\tno"
paste <(grep -Po '^\t\K.*(?=\tyes)' new.txt) <(grep -Po '^\t\K.*(?=\tno)' new.txt)
Output:
yes no 1.01 2.00 1.2223 0.93 1.7211
Upvotes: 2