Christian Townsend
Christian Townsend

Reputation: 301

How can I split a multi-line String Variable in Python?

I have String variables of the following format:

<?xml version="1.0" encoding="UTF-8"?>
<packages>
    <package name="firstPackage" version="LATEST" params="xxx" force="false"/>
    <package name="secondPackage" version="LATEST" params="xxx" force="false"/>
</packages>

My goal is to split the String + add to a list such that I end up with the following:

packages = ["<package name="firstPackage" version="LATEST" params="xxx" force="false"/>", "<package name="secondPackage" version="LATEST" params="xxx" force="false"/>"]

I am fairly new to Python but I believe the order of operations is:

  1. Split string by new line ("\n")
  2. For each line that is split this way, add to a list

To do this, I think I need a nested loop. The first loop to go through the initial multi-line Strings, but then another inside to break apart each line and store it as a separate list item. Below is some semi-pseudo code I have written:

variableList = [] # list containing Multi-line strings
packageList = [] # list to contain separated items
for item in variableList:
    stringVariable = item
    splitPackage = (stringVariable.split("\n"))
    for package in splitPackage:
        packageList.append(splitPackage)

Any help/advise would be appreciated.

Upvotes: 1

Views: 584

Answers (2)

Sujal Singh
Sujal Singh

Reputation: 582

A suggestion, don't use splitlines. Use a library that parses xml.

Python comes with a library to parse xml, to achieve your goal try this:

import xml.etree.ElementTree as ET

# your xml wasn't valid
source = """<?xml version="1.0" encoding="utf-8"?>
<packages>
    <package name="firstPackage" version="LATEST" params="xxx" force="false"/>
    <package name="secondPackage" version="LATEST" params="xxx" force="false"/>
</packages>"""

root = ET.fromstring(source)
for child in root:
    print(child.attrib)

This will output:

{'name': 'firstPackage', 'version': 'LATEST', 'params': 'xxx', 'force': 'false'}
{'name': 'secondPackage', 'version': 'LATEST', 'params': 'xxx', 'force': 'false'}

Plus point with this approach is you can convert structured data into any format you'd like, for example into the list you want:

packages = []
for child in root:
    package_attributes = child.attrib.values()
    desired_format = '<package name="{}" version="{}" params="{}" force="{}"/>'
    packages.append(desired_format.format(*package_attributes))

This will give you your desired list.

By the way you could use a different library to parse xml, since the documentation for this one says:

The xml.etree.ElementTree module is not secure against maliciously constructed data

Upvotes: 1

julianofischer
julianofischer

Reputation: 185

package_list = []
spl = multlinestring.splitlines()

for line in spl:
    if line.find('name') != -1:
        # trim leading/trailing whitespaces
        line = line.strip()
        package_list.append(line)

Upvotes: 1

Related Questions