Mr_LinDowsMac
Mr_LinDowsMac

Reputation: 2702

How to extract a specific text of an already extracted XML text in bash script

I managed to extract a value from a XML file:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

by using this:

  UUID=$(sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p' inputfile.xml)
       echo $UUID

I have this result in console:

14cd3204-4774-46b8-be89-cc834efcba89

That's it! But now, I need to use that UUID to get to another part of the same XML file, that I didn't show before. I simplified the XML file just to show the most relevant labels:

      <--more labels up this line>
 <ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
      <--more labels and text down this line-->
      <MachineEntry uuid="{14cd3204-4774-46b8-be89-cc834efcba89}" src="Machines/SomeMachine/SomeMachine.xml"/>
 <--more labels and text down this line-->

I need to get "SomeMachine" without extension, only that name. I tried myself adding some lines:

UUID=$(sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p' inputfile.xml)    
LastVMname=$(sed -ne '/MachineEntry uuid="{'$UUID'}"/s/.*src="Machines\([^"]*\).xml".*/\1/p' inputfile.xml)
    echo $LastVMname

But I get this output:

/SomeMachine/SomeMachine

and I don't how to get rid of /SomeMachine/SomeMachine, just need "SomeMachine". Sed documentation is quite confusing :S

Upvotes: 1

Views: 317

Answers (1)

Dennis Williamson
Dennis Williamson

Reputation: 359935

You can use an alternate delimiter for the substitute command and key on the slash in the data.

LastVMname=$(sed -ne '/MachineEntry uuid="{'$UUID'}"/s|.*src="Machines.*/\(.*\).xml".*|\1|p' inputfile.xml)

However, this way madness lies.

You should use an XML parser such as xmlstarlet. Something like:

uuid=$(xmlstarlet sel -t -m "//ExtraDataItem[@name='GUI/LastVMSelected']" -v @value)
LastVMname=$(xmlstarlet sel -t -m "//MachineEntry[uuid='$uuid'] -v @src)
LastVMname=${LastVMname##*/}    # strip up to and including the last slash
LastVMname=${LastVMname%.*}     # strip the extension

Upvotes: 2

Related Questions