Reputation: 1206
I am trying with below code and its not working as expcted. I am new to REGEX. Please share your ideas. Thanks in advance.
test.xml
<?xml version="1.0"?>
<audit>
<interfaces>
<interface_dtls>ABCD,ABCD 123</interface_dtls>
<interface_dtls>TESTING,123 TEST</interface_dtls>
</interfaces>
</audit>
Trying with below unix commands
#!/bin/bash
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
echo $line --Displaying line only for debugging purpose
interface_code=`echo $line | awk -F ',' '{print $1}'`
prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
Actual "ECHO" output:
ABCD,ABCD
TESTING,123
Expected "ECHO" output:
ABCD,ABCD 123
TESTING,123 TEST
Becuse of missing info(info after space) my query is not working as expected.
Upvotes: 1
Views: 280
Reputation: 1206
After little bit of research i am able to resolve the issue. But thanks to https://stackoverflow.com/users/5291015/inian , https://stackoverflow.com/users/4941495/kusalananda and https://stackoverflow.com/users/548225/anubhava for helpful insights.
test.xml
<?xml version="1.0"?>
<audit>
<interfaces>
<interface_dtls>ABCD,ABCD 123</interface_dtls>
<interface_dtls>TESTING,123 TEST</interface_dtls>
</interfaces>
</audit>
Before:
#!/bin/bash
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
echo $line --Displaying line only for debugging purpose
interface_code=`echo $line | awk -F ',' '{print $1}'`
prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
After:
#!/bin/bash
IFS='$\n'
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+" | cut -d '>' -f 2 | cut -d '<' -f 1`; do
echo $line --Displaying line only for debugging purpose
interface_code=$(echo $line | awk -F ',' '{print $1}')
prcdr_cd=$(echo $line | awk -F ',' '{print $2}')
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
"ECHO" output:
ABCD,ABCD 123
TESTING,123 TEST
Upvotes: 1
Reputation: 15633
The xml_grep
utility was mentioned in another answer. This uses XMLStarlet, which is also able to validate and modify XML files on the command line:
$ xml sel -t -v '//interface_dtls' -nl data.xml
ABCD,ABCD 123
TESTING,123 TEST
Upvotes: 1
Reputation: 85865
Using xml_grep
, the more recommended option for parsing, as grep
is not not an XML
aware tool.
$ xml_grep 'interface_dtls' file --text_only
ABCD,ABCD 123
TESTING,123 TEST
One could also use grep
as pointed by anubhava
over in comments. Probably not the best of ways to do it, but can done for a one-time debug. For proper functionality use any XML readable commands (e.g xmllint
or xml_grep
).
$ grep -oP "(?<=<interface_dtls>)[^<]+" xml_file
ABCD,ABCD 123
TESTING,123 TEST
The skeletal code for extracting the individual words from the command can be done as below. I will leave it up to you to tweak it as you need and do not use the outdated `` style command expansion, rather use $
wherever applicable.
#!/bin/bash
while read -r paramA paramB;
do
interface_code=$(echo $paramA | awk -F ',' '{print $1}')
prcdr_cd=$(echo $paramA | awk -F ',' '{print $2}')
echo $interface_code $prcdr_cd
done < <(xml_grep 'interface_dtls' file --text_only)
Upvotes: 2