Reputation: 37
I am trying to use a .txt file with around 5000 patterns (spaced with a line) to search through another file of 18000 lines for any matches. So far I've tried every form of grep and awk I can find on the internet and it's still not working, so I am completely stumped.
Here's some text from each file.
Pattern.txt
rs2622590
rs925489
rs2798334
rs6801957
rs6801957
rs13137008
rs3807989
rs10850409
rs2798269
rs549182
There's no extra spaces or anything.
File.txt
snpid hg18chr bp a1 a2 zscore pval CEUmaf
rs3131972 1 742584 A G 0.289 0.7726 .
rs3131969 1 744045 A G 0.393 0.6946 .
rs3131967 1 744197 T C 0.443 0.658 .
rs1048488 1 750775 T C -0.289 0.7726 .
rs12562034 1 758311 A G -1.552 0.1207 0.09167
rs4040617 1 769185 A G -0.414 0.6786 0.875
rs4970383 1 828418 A C 0.214 0.8303 .
rs4475691 1 836671 T C -0.604 0.5461 .
rs1806509 1 843817 A C -0.262 0.7933 .
The file.txt was downloaded directly from a med directory.
I'm pretty new to UNIX so any help would be amazing!
Sorry edit: I have definitely tried every single thing you guys are recommending and the result is blank. Am I maybe missing a syntax issue or something in my text files?
P.P.S I know there are matches as doing individual greps works. I'll move this question to unix.stackexchange. Thanks for your answers guys I'll try them all out.
Issue solved: I was obviously using DOS carriages. I didn't know about this before so thank you everyone that answered. For future users who are having this issue, here is the solution that worked:
dos2unix *
awk 'NR==FNR{p[$0];next} $1 in p' Patterns.txt File.txt > Output.txt
Upvotes: 0
Views: 2110
Reputation: 784998
You can use grep -Fw
here:
grep -Fw -f Pattern.txt File.txt
Options used are:
-F
- Fixed string search to tread input as non-regex-w
- Match full words only-f file
- Read pattern from a fileUpvotes: 3
Reputation: 203254
idk if it's what you want or not, but this will print every line from File.txt whose first field equals a string from Patterns.txt:
awk 'NR==FNR{p[$0];next} $1 in p' Patterns.txt File.txt
If that is not what you want, tell us what you do want. If it is what you want but doesn't produce the output you expect then one or both of your files contains control characters courtesy of being created in Windows so run dos2unix
or similar on them both first.
Upvotes: 0
Reputation: 385
Use a shell script to read each line of the file containing your patterns then fgrep it.
#!/bin/bash
FILENAME=$1
awk '{kount++;print $0}' $FILENAME | fgrep -f - PATTERNFILE.txt
Upvotes: -1