SunFire
SunFire

Reputation: 31

Grep list (file) from another file

Im new to bash and trying to extract a list of patterns from file:

File1.txt

ABC
BDF
GHJ

base.csv (tried comma separated and tab delimited)

line 1,,,,"hfhf,ferf,ju,ABC"
line 2 ,,,,,"ewy,trggt,gtg,ABC,RFR"
line 3 .."himk,n,hn.ujj., BDF"

etc

Suggested output is smth like

ABC
line 1..
line 2..(whole lines)
BDF
line 3..

and so on for each pattern from file 1

the code i tried was:

#!/bin/bash
for i in *.txt -# cycle through all files containing pattern lists
do
for q in "$i"; # # cycle through list
do
echo $q >>output.${i}; 
grep -f "${q}" base.csv >>output.${i};
echo "\n";
done
done

But output is only filename and then some list of strings without pattern names, e.g.

File1.txt
line 1...
line 2... 
line 3..

so i don`t know to what pattern belongs each string and have to check and assign manually. Can you please point out my errors? Thanks!

Upvotes: 3

Views: 4965

Answers (4)

SunFire
SunFire

Reputation: 31

Thank you for your kind help, my friends. Tried both variants above but kept getting various errors ( "do" expected) or misbehavior ( gets names of pattern blocks, eg ABC, BDF, but no lines. Gave up for a while and then eventually tried another way While base goal were to cycle through pattern list files, search for patterns in huge file and write out specific columns from lines found - i simply wrote

for *i in *txt  # cycle throughfiles w/ patterns
do
  grep -F -f "$i" bigfile.csv >> ${i}.out1   #greps all patterns from current file
  cut -f 2,3,4,7 ${i}.out1>> ${i}.out2   # cuts columns of interest and writes them out to another file
done

I'm aware that this code should be improved using some fancy pipeline features, but it works perfectly as is, hope it`ll help somebody in similar situation. You can easily add some echoes to write out pattern list names as i initially requested

Upvotes: 0

James Brown
James Brown

Reputation: 37464

Here is one that separates (with split, comma-separatd with quotes and spaces stripped off) words from file2 to an array (word[]) and stores the record names (line 1 etc.) to it comma-separated:

awk '
NR==FNR {
    n=split($0,tmp,/[" ]*(,|$)[" ]*/)                                  # split words
    for(i=2;i<=n;i++)                                                  # after first
        if(tmp[i]!="")                                                 # non-empties
            word[tmp[i]]=word[tmp[i]] (word[tmp[i]]==""?"":",") tmp[1] # hash rownames
    record[tmp[1]]=$0                                                  # store records
    next
}
($1 in word) {                                                         # word found
    n=split(word[$1],tmp,",")                                          # get record names
    print $1 ":"                                                       # output word
    for(i=1;i<=n;i++)                                                  # and records
        print record[tmp[i]]
}' file2 file1

Output:

ABC:
line 1,,,,"hfhf,ferf,ju,ABC"
line 2 ,,,,,"ewy,trggt,gtg,ABC,RFR"
BDF:
line 3 .."himk,n,hn.ujj., BDF"

Upvotes: 0

tripleee
tripleee

Reputation: 189948

grep can process multiple files in one go, and then has the attractive added bonus of indicating which file it found a match in.

grep -f File1.txt base.csv >output.txt

It's not clear what you hope for the inner loop to do; it will just loop over a single token at a time, so it's not really a loop at all.

If you want the output to be grouped per pattern, here's a for loop which looks for one pattern at a time:

while read -r pat; do
    echo "$pat"
    grep "$pat" *.txt
done <File1.txt >output.txt

But the most efficient way to tackle this is to write a simple Awk script which processes all the input files at once, and groups the matches before printing them.

An additional concern is anchoring. grep "ABC" will find a match in 123DEABCXYZ; is this something you want to avoid? You can improve the regex, or, again, turn to Awk which gives you more control over where exactly to look for a match in a structured line.

awk '# Read patterns into memory
    NR==FNR { a[++i] = $1; next }
    # Loop across patterns
    { for(j=1; j<=i; ++j)
        if($0 ~ a[j]) {
            print FILENAME ":" FNR ":" $0 >>output.a[j]
            next }
    }' File1.txt base.csv

Upvotes: 1

jvdmr
jvdmr

Reputation: 723

You're not actually reading the files, you're just handling the filenames. Try this:

#!/bin/bash
for i in *.txt # cycle through all files containing pattern lists
do
  while read -r q # read file line by line
  do
    echo "$q" >>"output.${i}" 
    grep -f "${q}" base.csv >>"output.${i}"
    echo "\n"
  done < "${i}"
done

Upvotes: 0

Related Questions