jobs
jobs

Reputation: 63

Get attribute value from xml tags with shell script and convert to csv

The task:

I'm trying to get the attribute-value from xml tags with a shell script, split the value up and save them in a .csv-file.

This is how the xml looks like:

<host>
  <servers>
    <server name="Type1Name1-Port1" >...</server>
    <server name="Type2Name2-Port2" >...</server>
    <server name="Type3Name3-Port3" >...</server>
    ...
    <server name="TypexNamex-Portx" >...</server>
  </servers>
</host>

I'd like to get the values from the "name"-attribute and split them up like following:
Type;Name;Port

The output csv file I want should look like this:

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
...
Typex;Namex;Portx

The problem:

I can use any shell-language I want to. I prefer bash and ksh.

My questions:

EDIT:

Example data of a server-name:

T-TTT_AAA-A-SSS-PPPP

Where T represents the Type, A the Applicationname, S the Server-Name, P the Port. The length of T, A and S are variable. P is constant.

Upvotes: 2

Views: 2503

Answers (3)

JJoao
JJoao

Reputation: 5357

xidel -e '//server/@name' f.xml |  sed ...

Upvotes: 0

Walter A
Walter A

Reputation: 20032

Without xmllint you can parse input like

<host>
  <servers>
    <server name="Type1_Name1-Port1" >...</server>
    <server name="Type-2_Name2-Port2" >...</server>
    <server name="Type3_Name-3-Port3" >...</server>
  </servers>
</host>

with

sed -n '/<server name=/ s/[^"]*"\([^_]*\)_\([^"]*\)-\([^"]*\)".*/\1;\2;\3/p' inputfile

Upvotes: 1

Aserre
Aserre

Reputation: 5072

Here is what I came up with, using only common tools : xmllint and sed :

echo 'cat //host/servers/server/@name' | xmllint --shell data.xml | sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1,\2,\3:p'

The sed part is done according to OP's examples at the moment of posting.

Breakdown:

  • echo 'cat //host/servers/server/@name' : we pass this command to xmllint. It will catch the name attribute of all the nodes inside <host><servers><server ...> ... </server></servers></hosts>
  • xmllint --shell data.xml : iterates through data.xml and executes the commands passed as argument in an interactive shell.
  • sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1;\2;\3:p' : we process the output of xmllint to only keep the data we are interested
    • xmllint will produce the following output : name="Type1Name1-Port1"
    • We define 3 capture groups : a capital letter followed by any character except capital (for Type), another capital letter followed by any character except capital (for Name), and any character between the - and " character
    • We tell sed to only print the matched strings, separated by semicolumns

Output :

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
Typex;Namex;Portx

EDIT:

To fit the pattern you indicated in the comments, you'll just have to change the sed regex, for instance :

sed -n 's: name=\"\(.*\)_\(.*\)-\(.\{4\}\)\":\1,\2,\3:p'

This will match the format T-TTT_AAA-A-SSS-PPPP, with any length for the type and server name. Try to fiddle around the regex or ask another question in the regex tag if this is not exactly what you need.

Upvotes: 1

Related Questions