justaguy
justaguy

Reputation: 3022

awk to output matches to separate files

I am trying to combine the text in $2 that is the same and output them to separate files with the match being the name of the new file. Since the actual files are quite large I open each file, then close to save on speed and memory, my attempt is below. Thank you :).

awk '{printf "%s\n", $2==$2".txt"; close($2".txt")}' input.txt **'{ print $2 > "$2.txt" }'**

input.txt

chr19:41848059-41848167 TGFB1:exon.2;TGFB1:exon.3;TGFB1:exon.4 284.611 108 bases
chr15:89850833-89850913 FANCI:exon.20;FANCI:exon.27;FANCI:exon.32;FANCI:exon.33;FANCI:exon.34 402.012 80 bases
chr15:31210356-31210508 FANC1:exon.6;FANC1:exon.7 340.914 152 bases
chr19:41850636-41850784 TGFB1:exon.1;TGFB1:exon.2;TGFB1:exon.3 621.527 148 bases

Desired output for TGFB1.txt

chr19:41848059-41848167 TGFB1:exon.2;TGFB1:exon.3;TGFB1:exon.4 284.611 108 bases
chr19:41850636-41850784 TGFB1:exon.1;TGFB1:exon.2;TGFB1:exon.3 621.527 148 bases

Desired output for FANC1.txt

chr15:89850833-89850913 FANCI:exon.20;FANCI:exon.27;FANCI:exon.32;FANCI:exon.33;FANCI:exon.34 402.012 80 bases
chr15:31210356-31210508 FANC1:exon.6;FANC1:exon.7 340.914 152 bases

EDIT:

awk -F '[ :]' '{f = $3 ".txt";  close($3 ".txt")} print > f}' BMF_unix_loop_genes_IonXpress_008_150902_loop_genes_average_IonXpress_008_150902.bed > /home/cmccabe/Desktop/panels/BMF **/"$f".txt;**
bash: /home/cmccabe/Desktop/panels/BMF: Is a directory

Upvotes: 0

Views: 156

Answers (2)

glenn jackman
glenn jackman

Reputation: 247220

You can just redefine your field separator to include a colon and then the file name would be in $3

awk -F '[ :]' '{f = $3 ".txt"; print > f}' input.txt

I've encountered problems with some awks where constructing the filename to the right of the redirection is problematic, which is why I'm using a variable. However the Friday afternoon beer cart has been around and I can't recall specific details :/

I wouldn't bother closing the files unless you're expecting hundreds or thousands of new files to be produced.

Upvotes: 2

karakfa
karakfa

Reputation: 67567

You need to split the second field to the desired field name. This should do

$ awk 'BEGIN{close(p)} {split($2,f,":"); p=f[1]".txt"; print $0 > p }' file

Note that it won't produce your output exactly since you have a typo in one of the fields

$ ls *.txt
FANC1.txt  FANCI.txt  TGFB1.txt

Upvotes: 2

Related Questions