goks
goks

Reputation: 1206

Regexp is not working as expected in unix

I am trying with below code and its not working as expcted. I am new to REGEX. Please share your ideas. Thanks in advance.

test.xml

<?xml version="1.0"?>
<audit>
    <interfaces>
        <interface_dtls>ABCD,ABCD 123</interface_dtls>
        <interface_dtls>TESTING,123 TEST</interface_dtls>
    </interfaces>
</audit>

Trying with below unix commands

#!/bin/bash
for line in `cat  test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
    echo $line  --Displaying line only for debugging purpose
    interface_code=`echo $line | awk -F ',' '{print $1}'`
    prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
    hive -e "select * from table \
    where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done

Actual "ECHO" output:

ABCD,ABCD
TESTING,123

Expected "ECHO" output:

ABCD,ABCD 123
TESTING,123 TEST

Becuse of missing info(info after space) my query is not working as expected.

Upvotes: 1

Views: 280

Answers (3)

goks
goks

Reputation: 1206

After little bit of research i am able to resolve the issue. But thanks to https://stackoverflow.com/users/5291015/inian , https://stackoverflow.com/users/4941495/kusalananda and https://stackoverflow.com/users/548225/anubhava for helpful insights.

test.xml

<?xml version="1.0"?>
<audit>
    <interfaces>
        <interface_dtls>ABCD,ABCD 123</interface_dtls>
        <interface_dtls>TESTING,123 TEST</interface_dtls>
    </interfaces>
</audit>

Before:

#!/bin/bash
for line in `cat  test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
    echo $line  --Displaying line only for debugging purpose
    interface_code=`echo $line | awk -F ',' '{print $1}'`
    prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
    hive -e "select * from table \
    where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done

After:

#!/bin/bash
IFS='$\n'
for line in `cat  test.xml | grep -oP "(?<=interface_dtls>)[^<]+" | cut -d '>' -f 2 | cut -d '<' -f 1`; do
    echo $line  --Displaying line only for debugging purpose
    interface_code=$(echo $line | awk -F ',' '{print $1}')
    prcdr_cd=$(echo $line | awk -F ',' '{print $2}')
    hive -e "select * from table \
    where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done

"ECHO" output:

ABCD,ABCD 123
TESTING,123 TEST

Upvotes: 1

Kusalananda
Kusalananda

Reputation: 15633

The xml_grep utility was mentioned in another answer. This uses XMLStarlet, which is also able to validate and modify XML files on the command line:

$ xml sel -t -v '//interface_dtls' -nl data.xml
ABCD,ABCD 123
TESTING,123 TEST

Upvotes: 1

Inian
Inian

Reputation: 85865

Using xml_grep, the more recommended option for parsing, as grep is not not an XML aware tool.

$ xml_grep 'interface_dtls' file --text_only
ABCD,ABCD 123
TESTING,123 TEST

One could also use grep as pointed by anubhava over in comments. Probably not the best of ways to do it, but can done for a one-time debug. For proper functionality use any XML readable commands (e.g xmllint or xml_grep).

$ grep -oP "(?<=<interface_dtls>)[^<]+" xml_file
ABCD,ABCD 123
TESTING,123 TEST

The skeletal code for extracting the individual words from the command can be done as below. I will leave it up to you to tweak it as you need and do not use the outdated `` style command expansion, rather use $ wherever applicable.

#!/bin/bash

while read -r paramA paramB;
do
    interface_code=$(echo $paramA | awk -F ',' '{print $1}')
    prcdr_cd=$(echo $paramA | awk -F ',' '{print $2}')

    echo $interface_code $prcdr_cd

done < <(xml_grep 'interface_dtls' file --text_only)

Upvotes: 2

Related Questions