user3252621
user3252621

Reputation:

reading grep patterns from files

I have a text file containing my input of strings to be grepped from other file along with content following the string. I am using

grep -A -f file1.txt file2.txt > output.txt

But it is not giving the result. Where I am doing mistake.

input file1

536911
536912
536920

input file 2

>gi|536911|CP006573.1|:c959-690 Mannheimia haemolytica D171, complete genome
ATGAAATGCGAACGTTTAGAAGAGTTATTAGAGTTACTTGGCGAACATTGGCGTAAAAATCCTGACTTAC
ACCTCATTGATATTTTGCAGCAGCTTTCAGTTGAAGTGGGCGAGCCTGATAATTTCAAAGCGTTAAGCGA
TGAAGTGTTAATCTATCAGCTTAAAATGCGAAATGCAGGCAAATTTGAGCCTATTCCCGGCATAAAAAAA
GATTATGAAGATGATTTTAAAACGGCTTTATTGCGAGCTCGTGGAATTTTAAACGATTAA
>gi|536912|gb|CP006573.1|:c6390-2194 Mannheimia haemolytica D171, complete genome
ATGAAGACCAAAACATTTACTCGTTCTTATCTTGCTTCTTTTGTAACAATCGTATTAAGTTTACCTGCTG
TAGCATCTGTTGTACGTAATGATGTGGACTATCAATACTTCCGCGATTTTGCCGAAAATAAAGGACCATT
TTCAGTTGGTTCAATGAATATTGATATTAAAGACAACAATGGACAACTTGTAGGCACGATGCTTCATAAT
TTACCAATGGTTGATTTTAGTGCTATGGTAAGAGGTGGATATTCTACTTTAATTGCACCACAATATTTAG
TTAGTGTTGCACATAATACTGGATATAAAAATGTTCAATTTGGTGCTGCAGGTTATAACCCTGATTCACA
TCACTATACTTATAAAATTGTTGACCGCAATGATTATGAAAAGGTTCAAGGAGGGTTGCACCCAGACTAT
>gi|536913|gb|CP006573.1|:7500-8540 Mannheimia haemolytica D171, complete genome
ATGTTTTATTCTAACAACCCTCTCATTAAACACAAGACCGGTTTATTAAATTTAGCAGAAGAACTGGGTA
ATATTTCTCAAGCCTGCAAAGTAATGGGAATGAGCCGAGATACATTCTATCGTTATCAACAAGCGGTTGA
GCAAGGTGGTGTTGAAGCATTGCTGAATCAAAATAGACGCGTTCCCAACTTAAAAAATCGTGTTGATGAG

required output

>gi|536911|CP006573.1|:c959-690 Mannheimia haemolytica D171, complete genome
ATGAAATGCGAACGTTTAGAAGAGTTATTAGAGTTACTTGGCGAACATTGGCGTAAAAATCCTGACTTAC
ACCTCATTGATATTTTGCAGCAGCTTTCAGTTGAAGTGGGCGAGCCTGATAATTTCAAAGCGTTAAGCGA
TGAAGTGTTAATCTATCAGCTTAAAATGCGAAATGCAGGCAAATTTGAGCCTATTCCCGGCATAAAAAAA
GATTATGAAGATGATTTTAAAACGGCTTTATTGCGAGCTCGTGGAATTTTAAACGATTAA
>gi|536912|gb|CP006573.1|:c6390-2194 Mannheimia haemolytica D171, complete genome
ATGAAGACCAAAACATTTACTCGTTCTTATCTTGCTTCTTTTGTAACAATCGTATTAAGTTTACCTGCTG
TAGCATCTGTTGTACGTAATGATGTGGACTATCAATACTTCCGCGATTTTGCCGAAAATAAAGGACCATT
TTCAGTTGGTTCAATGAATATTGATATTAAAGACAACAATGGACAACTTGTAGGCACGATGCTTCATAAT
TTACCAATGGTTGATTTTAGTGCTATGGTAAGAGGTGGATATTCTACTTTAATTGCACCACAATATTTAG
TTAGTGTTGCACATAATACTGGATATAAAAATGTTCAATTTGGTGCTGCAGGTTATAACCCTGATTCACA
TCACTATACTTATAAAATTGTTGACCGCAATGATTATGAAAAGGTTCAAGGAGGGTTGCACCCAGACTAT

How to achieve this task? Using grep or Sed

Thanks in advance

Upvotes: 2

Views: 91

Answers (2)

G. Cito
G. Cito

Reputation: 6378

you could remove the line ends and then make each record into a line:

cat file2.txt| tr -d '\n' | sed -e $'s/>gi/\\\n>gi/g'| grep -f file1.txt

Or paying heed to "useless use of cat" ;-)

tr -d '\n' < file2.txt | sed -e $'s/>gi/\\\n>gi/g' | grep -f file1.txt

à la chomp, split in Perl.

Upvotes: 0

nu11p01n73R
nu11p01n73R

Reputation: 26667

Since you are unsure of the number of lines following the pattern -A option wont help you.

An awk solution would be like

$ awk -F\| 'NR==FNR{pattern[$0];next} { if ($2 in pattern){flag=1} else if(NF > 1){flag=0}} flag' file1 file2
>gi|536911|CP006573.1|:c959-690 Mannheimia haemolytica D171, complete genome
ATGAAATGCGAACGTTTAGAAGAGTTATTAGAGTTACTTGGCGAACATTGGCGTAAAAATCCTGACTTAC
ACCTCATTGATATTTTGCAGCAGCTTTCAGTTGAAGTGGGCGAGCCTGATAATTTCAAAGCGTTAAGCGA
TGAAGTGTTAATCTATCAGCTTAAAATGCGAAATGCAGGCAAATTTGAGCCTATTCCCGGCATAAAAAAA
GATTATGAAGATGATTTTAAAACGGCTTTATTGCGAGCTCGTGGAATTTTAAACGATTAA
>gi|536912|gb|CP006573.1|:c6390-2194 Mannheimia haemolytica D171, complete genome
ATGAAGACCAAAACATTTACTCGTTCTTATCTTGCTTCTTTTGTAACAATCGTATTAAGTTTACCTGCTG
TAGCATCTGTTGTACGTAATGATGTGGACTATCAATACTTCCGCGATTTTGCCGAAAATAAAGGACCATT
TTCAGTTGGTTCAATGAATATTGATATTAAAGACAACAATGGACAACTTGTAGGCACGATGCTTCATAAT
TTACCAATGGTTGATTTTAGTGCTATGGTAAGAGGTGGATATTCTACTTTAATTGCACCACAATATTTAG
TTAGTGTTGCACATAATACTGGATATAAAAATGTTCAATTTGGTGCTGCAGGTTATAACCCTGATTCACA
TCACTATACTTATAAAATTGTTGACCGCAATGATTATGAAAAGGTTCAAGGAGGGTTGCACCCAGACTAT

What it does?

  • -F\| sets the field seperator as |

  • 'NR==FNR{pattern[$0];next} stores the pattern from first file to an array pattern. Here NR==FNR true for the first file, file1

  • { if ($2 in pattern){flag=1} if the second column, $2 is in array pattern, sets the flag as one

  • else if(NF > 1){flag=0}} sets the flag as 0 only when the pattern is not found in the line and the line contian >gi|xxxxx|

  • flag if the flag is set, performs the default action to print the entire line

Upvotes: 1

Related Questions