Reputation: 2298
I want to print the programming languages in file1 that appears in file2, its corresponding line number in file2 and the complete line of file2.
file1 is like this:
Ruby
Visual Basic
Objective-C
C
R
C++
Basic
file2 is like this:
5. ab cde fg Java hij kl
2. ab PHP dddf llf
4. cde fg z o Objective-C oode
8. a12b cde JavaScript kdk
6. ab99r cde Visual Basic llso dkd
1. lkd dsk Ruby kksdk
3. Python dsdls
9. CSS dkdsk
4. Jdjdj C Jjd Kkd
12. Iiii Jjd R Hhd
5. Jjjff C++ jdjejd
7. Jfjfjdoo Uueye Basic Jje Tasdk
I´d like to get this output:
6|Ruby|1. lkd dsk Ruby kksdk
5|Visual Basic|6. ab99r cde Visual Basic llsodkd
3|Objective-C|4. cde fg z o Objective-C oode
9|C|4. Jdjdj C Jjd Kkd
10|R|12. Iiii Jjd R Hhd
11|C++|5. Jjjff C++ jdjejd
12|Basic|7. Jfjfjdoo Uueye Basic Jje Tasdk
where 6,5 and 3 are the line number where "Ruby", "Visual Basic" and "Objective-C" appears within file2.
I've tried so far with the code below, but this code works only if file 2 has a list of exact matches when comparing with file1.
awk 'NR == FNR{a[$0];next} ($0 in a)' file1 file2
In this case the programming languages in file2 have some text before and after and I'm stuck in how to get the output i want.
Thanks in advance for any help.
Upvotes: 1
Views: 118
Reputation: 203899
With GNU awk for sorted_in to search for the longest languages (e.g. Visual Basic
) first and remove those from the current line as they're found so the shorter languages that are part of them (e.g. Basic
) can't be found within them:
$ cat tst.awk
BEGIN { OFS="|" }
NR==FNR {
lengths[$0] = length($0)
next
}
{
line = " " $0 " "
PROCINFO["sorted_in"] = "@val_num_desc"
for (lang in lengths) {
if ( s = index(line," "lang" ") ) {
print FNR, lang, $0
line = substr(line,1,s) substr(line,s+1+lengths[lang])
}
}
}
$ awk -f tst.awk file1 file2
3|Objective-C|4. cde fg z o Objective-C oode
5|Visual Basic|6. ab99r cde Visual Basic llso dkd
6|Ruby|1. lkd dsk Ruby kksdk
$ cat file1
Ruby
Visual Basic
Objective-C
C
C++
Basic
Upvotes: 1
Reputation: 133600
Could you please try following(changed index
use in code as per @Ed Morton sir's suggestions).
awk -v OFS='|' '
FNR==NR{
a[$0]
next
}
{
for(i in a){
if(index(" "$0" "," "i" ")){
print FNR,i,$0
}
}
}
' Input_file1 Input_file2 | sort -t'|' -nr
Output will be as follows.
6|Ruby|1. lkd dsk Ruby kksdk
5|Visual Basic|6. ab99r cde Visual Basic llso dkd
3|Objective-C|4. cde fg z o Objective-C oode
Explanation: Adding explanation for above code now.
awk -v OFS='|"' ' ##Starting awk program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file is being read.
a[$0] ##creating an array named a whose index is $0 and value is $0.
}
{ ##Starting block here.
for(i in a){ ##Starting a for loop here.
if(index(" "$0" "," "i" ")){ ##checking if value of a[i] array present in current line.
print FNR,i,$0 ##If above is TRUE then print FNR"|"i"|"$0 as per OP need.
}
}
}
' file1 file2 | sort -t'|' -nr ##Mentioning Input_files names here and passing its output into sort command and sorting it with reverse order.
Upvotes: 3