user3299633
user3299633

Reputation: 3390

Grab section of text where a string resides

I have a text file that looks like this:

# Query 1:
.
.
.
# Hosts ip-127-0-0-1
.
.
.

# Query 2:
.
.

In my file there might be multiple queries; I only want to extract the information when my ip is NOT a certain value.

For example here I want to capture starting with '# Query 1' up until the space right before where it says '# Query 2' -- however I need to do this ONLY if Hosts ip does NOT match ip-127-0-0-1. This isn't an exact match as ip-127-0-0-1 can have other text appended to the end like ip-127-0-0-1.notusefultext

I'm open to using awk, sed or python to assist with this problem.

Upvotes: 0

Views: 127

Answers (4)

stack0114106
stack0114106

Reputation: 8811

I assume your query file would be like below. Please try this Perl solution

$ cat query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'

# Query 2:
select a b c from
tab2
# Hosts ip-127-0-0-1
where a = '1'

# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'

$  perl -0777 -ne ' $_.="# Query "; while( /(# Query.+?)(# Query.+)/smg ) { $x=$1 ; $_="$2"; print $x if not $x=~/ip-127-0-0-1/ } ' query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'

# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'

$

or try this.

$ perl -0777 -ne ' while( /(# Query.+?)(# Query.+|\Z)/smg ) { $x=$1 ; $_="$2"; print "$x\n" if not $x=~/ip-127-0-0-1/ } ' query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'


# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'
$

Upvotes: 0

user3299633
user3299633

Reputation: 3390

Final working solution:

# Pull in isolated code block for each individual query and write to unique file.
TEMP='temp_file'
while read -r line; do
    if [[ $line =~ ^#[[:space:]]Query[[:space:]][0-9].* ]]; then
        new_query='1'
        ((counter++))
        echo "$line" > ${TEMP}_${counter}
    else
        new_query='0'
        echo "$line" >> ${TEMP}_${counter}
    fi
done < "${LONG_RUNNING_QUERIES}"

# Remove first file, as it only contains query statistics for all long running queries.
rm ${TEMP}_0

# For all files that don't contain the IP, group them together in one file.
QUERIES_TO_GRAB='master_file'
> $QUERIES_TO_GRAB
for i in $(ls -v1 temp_file_*); do
    match=$(grep "${IP_ADDY}" "$i")
    if [ -z "$match" ]; then
        cat $i >> $QUERIES_TO_GRAB
    fi
done

Upvotes: 0

potong
potong

Reputation: 58578

This might work for you (GNU sed):

sed -n '/^# Query [0-9]*:/{:a;N;/^\s*$/M!ba;/Hosts.*127-0-0-1/I!p}' file

Use sed's -n option to only print explicitly. Focus on any line that begins # Query n*: where n* means zero or more digits (use [^:]* if this match is too specific). Gather up the current and following lines until (and including) an empty line. Test the collection of lines for the string 127-0-0-1 and if not present, print the collection. All other lines will not be printed.

N.B. The collection includes both the query line and the empty line, this may not be the case if the last query does not have an empty line as the last line of the file. This can be catered for by an ameliorated version:

sed -n '/^# Query [0-9]*:/{:a;$!{N;/^\s*$/M!ba};/Hosts.*127-0-0-1/I!p}' file

Upvotes: 1

Tyl
Tyl

Reputation: 5262

Given those anchors are exact, and there's nothing before # Query 1:,
try this please:

awk -v RS="# Query 2" 'FNR<2 && !/# Hosts ip-127-0-0-1/'

Judge from your own trying, if you want only lines starts with a letter in the block you described:

awk -v RS="# Query 2" -F"\n" 'FNR<2 && !/# Hosts ip-127-0-0-1/{for (i=1;i<=NF;i++) if($i~ "^[A-Za-z]") print $i}'

If the conditions are somewhat different, please leave comment.

Upvotes: 0

Related Questions