Print strings of file1 that appears in file2 in awk

Question

I want to print the programming languages in file1 that appears in file2, its corresponding line number in file2 and the complete line of file2.

file1 is like this:

Ruby
Visual Basic
Objective-C
C
R
C++
Basic

file2 is like this:

5. ab cde fg Java hij kl
2. ab PHP dddf llf 
4. cde fg z o Objective-C oode
8. a12b cde JavaScript kdk
6. ab99r cde Visual Basic llso dkd
1. lkd dsk Ruby kksdk
3. Python dsdls
9. CSS dkdsk
4. Jdjdj C Jjd Kkd
12. Iiii Jjd R Hhd
5. Jjjff C++ jdjejd
7. Jfjfjdoo Uueye Basic Jje Tasdk

I´d like to get this output:

 6|Ruby|1. lkd dsk Ruby kksdk
 5|Visual Basic|6. ab99r cde Visual Basic llsodkd            
 3|Objective-C|4. cde fg z o Objective-C oode
 9|C|4. Jdjdj C Jjd Kkd  
 10|R|12. Iiii Jjd R Hhd 
 11|C++|5. Jjjff C++ jdjejd
 12|Basic|7. Jfjfjdoo Uueye Basic Jje Tasdk

where 6,5 and 3 are the line number where "Ruby", "Visual Basic" and "Objective-C" appears within file2.

I've tried so far with the code below, but this code works only if file 2 has a list of exact matches when comparing with file1.

awk 'NR == FNR{a[$0];next} ($0 in a)' file1 file2

In this case the programming languages in file2 have some text before and after and I'm stuck in how to get the output i want.

Thanks in advance for any help.

Ed Morton · Accepted Answer

With GNU awk for sorted_in to search for the longest languages (e.g. Visual Basic) first and remove those from the current line as they're found so the shorter languages that are part of them (e.g. Basic) can't be found within them:

$ cat tst.awk
BEGIN { OFS="|" }
NR==FNR {
    lengths[$0] = length($0)
    next
}
{
    line = " " $0 " "
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (lang in lengths) {
        if ( s = index(line," "lang" ") ) {
            print FNR, lang, $0
            line = substr(line,1,s) substr(line,s+1+lengths[lang])
        }
    }
}

$ awk -f tst.awk file1 file2
3|Objective-C|4. cde fg z o Objective-C oode
5|Visual Basic|6. ab99r cde Visual Basic llso dkd
6|Ruby|1. lkd dsk Ruby kksdk

$ cat file1
Ruby
Visual Basic
Objective-C
C
C++
Basic

Print strings of file1 that appears in file2 in awk

Answers (2)

Related Questions