Whin3
Whin3

Reputation: 725

search a word in next lines after a first word has been found

Let's take for example this file textfile.txt :

foo
bar
foo
bar
foo**word1**bar
foo
bar**word2**foo
foo
foo
bar
foo**word1**bar
foo
foo
bar**word2**foo
foo
foo
bar
foo**word1**bar
foo
bar**word2**foo
foo
bar
foo**word1**bar
foo
bar
foo
bar
bar**word2**foo
foo

What I am trying to do is : Search for a first word in a file, here the word is **word1**, and if this word has been found, search in the same line and the next two the second word, here it's **word2**

I tried to use grep to search the **word1**, with the -n option to get the line number. Then with this line number, extract with sed the matching line and the next two, and then do an other grep to search for the **word2**. It also should match each time **word1** and **word2**.

But it doesn't feel like it's the best way to achieve this.

In this example, there should be 3 positive matches : the last one doesn't work because **word2** is 4 lines ahead from **word1**, and I want a maximum of 2 lines ahead.

Concerning awk's output, I would like to output the line numbers where the two words matched, and also their respective lines where they have been found.

I also have a shell script returning output. What I would like to do is : for each matching couple words, print "my_script_result" + "awk_result" > file

Upvotes: 0

Views: 439

Answers (3)

Yunnosch
Yunnosch

Reputation: 26703

Choosing grep from the tagged tools:

echo shelloutput && grep -nA2 "word1" EgrepToy.txt | egrep "word2"

Output:

shelloutput  
7-bar**word2**foo
20-bar**word2**foo

Since I am not sure whether I correclty understood "In this example, there should be 3 positive matches" (I think OP and I are somehow counting the "next lines" differently), I add an alternative to get three:

echo shelloutput && grep -nA3 "word1" EgrepToy.txt | egrep "word2"  

Output:

shelloutput  
7-bar**word2**foo  
14-bar**word2**foo  
20-bar**word2**foo  

Both solutions work basically identically:

  • create desired shelloutput echo shelloutput
  • continue immediatly to grep &&
  • grep for the first word egrep word1
  • including the right number of following lines in the output -A2
  • adding the input file line number -n
  • grep the result for the second word | egrep word2

Echoing shelloutput is a placeholder for anything you want to do.

Upvotes: 0

Yunnosch
Yunnosch

Reputation: 26703

Choosing sed from the tagged tools:

echo shelloutput && sed -En "/word1/{/word2/{=;p;};N;/word2/{=;p;};N;s/^.*\n//;/word2/{=;p;};N;s/^.*\n//;/word2/{=;p;}}" EgrepToy.txt

Output:

shelloutput
7  
bar**word2**foo  
14  
bar**word2**foo  
20  
bar**word2**foo  

Works like this:

  • create some output echo shelloutput
  • continue directly to sed &&
  • Look for the first word /word1/{
  • look for second word /word2/{
  • conditionally print line number and found line =;p;};
    • fetch next line N;
    • delete first pattern space line, including newline, without terminating s/^.*\n//;
    • look for second word /word2/{
    • print line number =;
    • print matching line p;
  • literally repeat that twice

If you want two matches, i.e. only two following lines scanned for word2, then only repeat once, simply by deleting one N;s/^.*\n//;/word2/{=;p;};.

Upvotes: 0

Kent
Kent

Reputation: 195029

this awk one-liner may help:

awk '/word1/{ok=1}ok && /word2/{print NR,$0}' file

In above line, /word1/ is your first word, /word2/ is your second word. The output would be matched line numbers and the matched lines.

It works in this way:

The script reads lines from the beginning of file, once word1 was found, set variable ok =1 (true). The 2nd part check ok AND word2 matched, if satisfied, print the output. Thus, if word2 was matched before we found word1, ok is false, the line will be skipped.

edit according to OP's update:

awk /word1/{ok=1;s=NR}ok && NR<=s+2 && /word2/{print NR,$0}' file
7 bar**word2**foo
20 bar**word2**foo

Upvotes: 1

Related Questions