qu1x0tc
qu1x0tc

Reputation: 53

Use awk to extract value from a line

I have these two lines within a file:

<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>

where I'd like to get the following as output using awk or sed:

3    
50000

Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.

sed -n '/WORD1/,/WORD2/p' /path/to/file

How can I extract the values I want from the file?

Upvotes: 5

Views: 13183

Answers (6)

jaybee
jaybee

Reputation: 955

Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:

sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt

This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

Upvotes: 0

Tom Fenech
Tom Fenech

Reputation: 74596

Looks like XML to me, so assuming it forms part of some valid XML, e.g.

<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>

You can use Perl's XML::Simple and do something like this:

perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'

Output:

3
50000

If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84521

The script solution with parameter expansion:

#!/bin/bash

while read line || test -n "$line" ; do
    value="${line%<*}"
    printf "%s\n" "${value##*\>}"
done <"$1"

output:

$ ./ltags.sh dat/ltags.txt
3
50000

Upvotes: 0

Technext
Technext

Reputation: 8107

Using sed:

sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file


Explanation:
.* takes care of everything that comes before the string limit

limit"* takes care of both the lines, one with limit" and the other one with just limit

([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.

\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and

Upvotes: 0

vks
vks

Reputation: 67968

        sed -e 's/[a-zA-Z.<\/>= \-]//g' file

Upvotes: 0

a5hk
a5hk

Reputation: 7834

awk -F'[<>]' '{print $3}' input.txt

input.txt:

<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>

Output:

3
50000

Upvotes: 8

Related Questions