Reputation: 3390
I have a text file that looks like this:
# Query 1:
.
.
.
# Hosts ip-127-0-0-1
.
.
.
# Query 2:
.
.
In my file there might be multiple queries; I only want to extract the information when my ip is NOT a certain value.
For example here I want to capture starting with '# Query 1' up until the space right before where it says '# Query 2' -- however I need to do this ONLY if Hosts ip does NOT match ip-127-0-0-1. This isn't an exact match as ip-127-0-0-1 can have other text appended to the end like ip-127-0-0-1.notusefultext
I'm open to using awk, sed or python to assist with this problem.
Upvotes: 0
Views: 127
Reputation: 8811
I assume your query file would be like below. Please try this Perl solution
$ cat query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'
# Query 2:
select a b c from
tab2
# Hosts ip-127-0-0-1
where a = '1'
# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'
$ perl -0777 -ne ' $_.="# Query "; while( /(# Query.+?)(# Query.+)/smg ) { $x=$1 ; $_="$2"; print $x if not $x=~/ip-127-0-0-1/ } ' query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'
# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'
$
or try this.
$ perl -0777 -ne ' while( /(# Query.+?)(# Query.+|\Z)/smg ) { $x=$1 ; $_="$2"; print "$x\n" if not $x=~/ip-127-0-0-1/ } ' query_ip.txt
# Query 1:
select a b c from
tab
# Hosts ip-127-8-8-1
where a = '1'
# Query 3:
select a b c from
tab3
# Hosts ip-127-9-9-1
where a = '1'
$
Upvotes: 0
Reputation: 3390
Final working solution:
# Pull in isolated code block for each individual query and write to unique file.
TEMP='temp_file'
while read -r line; do
if [[ $line =~ ^#[[:space:]]Query[[:space:]][0-9].* ]]; then
new_query='1'
((counter++))
echo "$line" > ${TEMP}_${counter}
else
new_query='0'
echo "$line" >> ${TEMP}_${counter}
fi
done < "${LONG_RUNNING_QUERIES}"
# Remove first file, as it only contains query statistics for all long running queries.
rm ${TEMP}_0
# For all files that don't contain the IP, group them together in one file.
QUERIES_TO_GRAB='master_file'
> $QUERIES_TO_GRAB
for i in $(ls -v1 temp_file_*); do
match=$(grep "${IP_ADDY}" "$i")
if [ -z "$match" ]; then
cat $i >> $QUERIES_TO_GRAB
fi
done
Upvotes: 0
Reputation: 58578
This might work for you (GNU sed):
sed -n '/^# Query [0-9]*:/{:a;N;/^\s*$/M!ba;/Hosts.*127-0-0-1/I!p}' file
Use sed's -n
option to only print explicitly. Focus on any line that begins # Query n*:
where n*
means zero or more digits (use [^:]*
if this match is too specific). Gather up the current and following lines until (and including) an empty line. Test the collection of lines for the string 127-0-0-1
and if not present, print the collection. All other lines will not be printed.
N.B. The collection includes both the query line and the empty line, this may not be the case if the last query does not have an empty line as the last line of the file. This can be catered for by an ameliorated version:
sed -n '/^# Query [0-9]*:/{:a;$!{N;/^\s*$/M!ba};/Hosts.*127-0-0-1/I!p}' file
Upvotes: 1
Reputation: 5262
Given those anchors
are exact, and there's nothing before # Query 1:
,
try this please:
awk -v RS="# Query 2" 'FNR<2 && !/# Hosts ip-127-0-0-1/'
Judge from your own trying, if you want only lines starts with a letter in the block you described:
awk -v RS="# Query 2" -F"\n" 'FNR<2 && !/# Hosts ip-127-0-0-1/{for (i=1;i<=NF;i++) if($i~ "^[A-Za-z]") print $i}'
If the conditions are somewhat different, please leave comment.
Upvotes: 0