chrys
chrys

Reputation: 85

grep on one file and sort matches to several output files

my question concerns the following: I have the file:

    FileA:
Peter Programmer
Frank Chemist
Charles Physicist
John Programmer
Alex Programmer
Harold Chemist
George Chemist

I now got all the job information from FileA and saved it to a unique list (FileB).

  FileB:
Programmer
Chemist
Physcist

(Assume the FileA goes on and on with many more people and redundant information)

What I want to do now is get all the job classes from FileA and create a new file for each Job-Class so that in the end I have:

FileProgrammer
Peter Programmer
John  Programmer
Alex  Programmer

FileChemist
Frank Chemist
Harold Chemist
George Chemist

FilePhysicist
Charles Physicist

I want to grep the pattern of the job name from the list in the Jobs File and create a new file for every job name which exists in the other original file.

So in reality, I have 56 Unique Elements in my list and the original file has several columns ( tab delimited).

What I did so far was this:

cut -f2 | sort | uniq > Jobs
grep -f(tr '\t' '\n' < "${Jobs}") "${FileA}" > FileA+"${Jobs}"

I assumed that on each new pattern match a new file would be created but I realized that it would just copy the file because there is no increment or iterative file creation.

Since my experience with bash is yet to be developed in depth, I hope you guys can help me. Thanks in advance.

@update: Input file looks like this:

4   23454   22110   Direct  +   3245    Corrected
3   21254   12110   Indirect    +   2319    Paused-@2
11  45233   54103   Direct  -   1134    Not-Corrected

Essentially, I want everything that has the status in column 7 of Corrected to be in a file named corrected and so for every unique value of column 7.

Upvotes: 2

Views: 118

Answers (2)

Zumo de Vidrio
Zumo de Vidrio

Reputation: 2111

You can do it with grep inside a loop with:

for i in $(cat FileB); do grep $i$ FileA >> File$i; done

Note that in FileA of your question you wrote "Physicist" and in FileB you wrote "Physcist", so they won't match. Anyway if you write both of them properly, the above command will work.

Upvotes: 1

Inian
Inian

Reputation: 85895

The answer craves for need of Awk, here is how you do it,

awk '{unique[$2]=(unique[$2] FS $1)}\
END  {for (i in unique) { \
        len=split(unique[i],temp); \
        for (j=1;j<=len;j++) print temp[j],i > "File"i".txt"} }' \
file

The idea is to create a hash-map, with unique[$2]=(unique[$2] FS $1), which literally means, treat $2 as the index for the array unique and have values appended from $1, so at the end of each line processing of your input file, the array looks like this,

# <key>  <value(s)>
Chemist  Frank Harold George
Physicist  Charles
Programmer  Peter John Alex

The END clause is executed after all the lines are processed, so from the array constructed, using the split() function which splits on a single whistespace, we store the contents of the array value to the array temp, and len contains the number of elements resulting after the split.

A loop for each hash index and with each of the split element, the values are printed and stored in the file.

Upvotes: 2

Related Questions