Mr_LinDowsMac
Mr_LinDowsMac

Reputation: 2702

How to extract from a file text between tokens using bash scripts

I was reading this question: Extract lines between 2 tokens in a text file using bash because I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.

Upvotes: 2

Views: 7600

Answers (4)

mpenkov
mpenkov

Reputation: 21906

Use this:

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done
  • grep gets you only the lines you need
  • cut splits the lines using some separator, and returns the Nth result of the split
  • -d " " sets the separator to space
  • -f3 returns the third result (1-based indexing)
  • ${f:7:36} extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value=" and trailing slash, etc.

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 360315

Using my answer from the question you linked:

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"/\1/p;ba}' inputfile

Explanation:

  • -n - don't do an implicit print
  • /<!-- this is token 1 -->/{ - if the starting marker is found, then
    • :a - label "a"
      • n - read the next line
      • /<!-- this is token 2 -->/q - if it's the ending marker, quit
      • \|GUI/LastVMSelected| - if the line matches the string
        • s/value="\([^"]*\)"/\1/p - print the string after 'value=' and before the next quote
    • ba - branch to label "a"
  • } end if

Upvotes: 0

Jan Hudec
Jan Hudec

Reputation: 76316

If they are on the same line (as they seem to be from your example), it's even easier. Just:

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p'

Explanation:

  • -n: Suppress default print
  • /name="GUI\/LastVMSelected"/: only lines matching this pattern
  • s/.value="([^"])"./\1/p
    • substitute everything, capturing the parenthesized part (the value of value)
    • and print the result

Upvotes: 3

Brian Agnew
Brian Agnew

Reputation: 272337

I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarlet command-line tools for processing XML. There's some documentation for querying XML docs here.

Upvotes: 1

Related Questions