Yugal Jindle
Yugal Jindle

Reputation: 45646

How to extract the lines between patterns?

I have a file with format like :

[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line

I want to extract the following blocks from above file :

[PATTERN]
line1
line2
line3
.
.
.
line

Note: Number of lines between 2 [PATTERN] may varies, so can't rely on number of lines.

Basically, I want to store each pattern and the lines following it to Database, so I wil have to iterate all such blocks in my file.

How do this with Shell Scripting ?

Upvotes: 0

Views: 3889

Answers (2)

blueFast
blueFast

Reputation: 44331

This assumes you are using bash as your shell. For other shells, the actual solution can be different.

Assuming your data is in data:

i=0 ; cat data  | while read line ; do \
  if [ "$line" == "[PATTERN]" ] ; then \
    i=$(($i + 1)) ; touch file.$i ; continue ; \
  fi ; echo "$line" >> file.$i ; \
done

Change [PATTERN] by your actual separation pattern.

This will create files file.1, file.2, etc.

Edit: responding to request about an awk solution:

awk '/^\[PATTERN\]$/{close("file"f);f++;next}{print $0 > "file"f}' data

The idea is to open a new file each time the [PATTERN] is found (skipping that line - next command), and writing all successive lines to that file. If you need to include [PATTERN] in your generated files, delete the next command.

Notice the escaping of the [ and ], which have special meaning for regular expressions. If your pattern does not contain those, you do not need the escaping. The ^ and $ are advisable, since they tie your pattern to the beginning and end of line, which you will usually need.

Upvotes: 1

Plouff
Plouff

Reputation: 3470

This can be for sure improved, but if you want to store lines in an array here is something I did in past:

#!/bin/bash
file=$1
gp_cnt=-1
i=-1

while read line
do
  # Match pattern
  if [[ "$line" == "[PATTERN]" ]]; then
    let "gp_cnt +=1"
    # If this is not the first match process group
    if [[ $gp_cnt -gt 0 ]]; then
      # Process the group
      echo "Processing group #`expr $gp_cnt - 1`"
      echo ${parsed[*]}
    fi
    # Start new group
    echo "Pattern #$gp_cnt catched"
    i=0
    unset parsed
    parsed[$i]="$line"

    # Other lines (lines before first pattern are not processed)
  elif [[ $gp_cnt != -1 ]]; then
    let "i +=1"
    parsed[$i]="$line"
  fi
done < <(cat $file)

# Process last group
echo "Processing group #$gp_cnt"
echo ${parsed[*]}

I don't like the processing of the last group out of the loop...

Upvotes: 0

Related Questions