question33
question33

Reputation: 19

Awk With Input File Match and Pattern Search

Sorry but I have never asked a question on a board such as this, please excuse inexperience.

I am trying to take a field from an input file, say field two from abc.txt, and match it in the def.txt. The problem is I also need to match an additional pattern in the def.txt file.

For exapmle, field 2 in abc.txt is "3". And the pattern I want to search for in def.txt is "efg". I need it to return all lines that match pattern "efg" and that contain "3".

As an additional constraint I want it to stop searching after it reaches a certain value, say "END". I have exhausted my efforts to find a simple one liner for this in awk or any variant.

I am befuddled on all of these points, is it ok to ask for help on this as a novice? Any assistance is appreciated, thanks.

Here is the code, which is not working at all: awk 'BEGIN { FS = " " } ;NR==FNR{a[$2]=++i;next} '{if ( $5 in a) && ($0 ~ '/efg/')} {print $0}' abc.txt def.txt

I am trying to achieve 3 things:

  1. Match input file field to def.txt fields

  2. Match a pattern in def.txt

  3. Stop the search when a value is encountered, for example "END".

Hoping for a one line solution if possible, I am just too much of an AWK beginner.

Sample Input 
Abc.txt
1
2
3
4

Def.txt
1 abc
1 efg
1 efg some more data
END
2 ghi
2 efg
2 efg some more data
END
3 jkl
3 efg
3 efg some more data
END

and so on...

Expected Output 
1 efg
1 efg some more data
2 efg
2 efg some more data
3 efg 
3 efg some more data

and with any help to have it stop upon reaching "END." Instead of going through the entire file and printing the subsequent instances of 1 efg, 2 efg, etc.

Upvotes: 1

Views: 990

Answers (1)

ghoti
ghoti

Reputation: 46856

There are some obvious concerns with your existing code. You provided:

awk 'BEGIN { FS = " " } ;NR==FNR{a[$2]=++i;next} '{if ( $5 in a) && ($0 ~ '/efg'/)} {print $0}' abc.txt def.txt

I see where you're going with this. I think what you mean is:

awk '

  # Step through first file, recording $2 in an array...
  NR==FNR {
    a[$2];
    next;
  }

  # Hard stop if we get a signal...
  $0 == "END" {
    quit;
  }

  # In the second+ file, test a condition.
  $5 in a && /efg/

' abc.txt def.txt

You can of course compress this into a one liner by removing comments and newlines:

awk 'NR==FNR{a[$2];next} $0=="END"{quit} $5 in a && /efg/' abc.txt def.txt

Notable changes:

  • The single quotes need to wrap your entire script. One at the start, one at the end, none "inside".
  • Awk splits by whitespace by default, so the FS may be unnecessary (unless you've got tabs in your fields, in which case you may put the FS back).
  • You don't need to increment a counter. In awk, if you simply mention an array element, it is "created" without having content, so you can use conditions like $5 in a without wasting so much memory.
  • The extra if statement was removed. Awk takes condition { statement } patterns. A condition is a condition whether it's in this format or inside an if.
  • The second element of your condition was shrunk to just a regex. By default, awk will take this to mean "does this regex apply to the current input line".
  • The print $0 command was removed, because this is the default behaviour if no statement is provided.

Upvotes: 2

Related Questions