user3979986
user3979986

Reputation: 203

Efficient way to parse txt file in bash/perl

I have a myriad of text files of size 300k+ lines.

The files are in this general format:

Username <user> filename <file>
<some large amount of text on one line>
...

The text file has this strict format- one line of formatted header text, followed by one really long line, which is the meat and potatoes of the file.

What I want to do is go through the file and for every set of lines (a set consisting of headers and the one line) look for some matching string within this long line .

If the string is there, then I want to print user and file. If not, then we continue on and don't print anything. And for those who will ask, the point of this exercise is just to print this out and then i will do some manipulation at a later point.

I know how to do this, but it is sort of brute force- just store the user and file when you detect them and if we detect the matching string, we print user and file. If not, just continue. However, this is extremely inefficient:

#!/usr/bin/sh
##not exact, just roughly what i am doing
while read line; do
if [[ $line =~ Username ([^ ]+) filename ([^ ]+) ]];then
    #store our variables
    continue
fi
if [[ $line =~ "string" ]];then
     #print user and file
fi
done < inputfile

Basically, is there some efficient way to detect the string I am looking for, then look back x number of lines (x corresponding to number of header lines) and then pull out the info I need? Thanks

PS Not so set on doing this in bash- perl works too.

EDIT: DESIRED OUTPUT

 <user>, <file>
 <user>, <file>
 ...

Upvotes: 2

Views: 458

Answers (2)

Thomas Foster
Thomas Foster

Reputation: 321

Awk solution, relying on each record being two lines (and the first line of the file being the header for the first record):

NR%2 { name = $2; file =$4; next }
/string/ { print name, file }

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246837

For really heavy text processing like this, perl is a good choice:

perl -nE '
  if ($. % 2 == 1) {
    ($user, $file) = (split ' ')[1,3];
  } 
  elsif (/search string/) {
    say "$user, $file";
  }
' file1 file2 ...

That can be "golfed" down to a more terse one-liner, if you like that kind of thing.

Upvotes: 1

Related Questions