Reputation: 41
I have this weird XML document that contains Phone number details, I need to export this into a CSV document but the problem is it's not formatted correctly. All of the elements are inside of </ string> tags and some "Name" fields are repeated but not in the exact same way (like in the example below, most repeated lines contain extra spaces or commas). And all the "Numbers" are indented from the "Name" fields.
<string>example1</string>
<string>014584111</string>
<string>example2</string>
<string>04561212123</string>
<string>example3</string>
<string>+1 156151561</string>
<string>example4</string>
<string>564513212</string>
<string>example3, </string>
<string>example4 </string>
How can I convert this into a CSV format without the repeated content using python? Here's an example output
FullName PhoneNumber
example1 014584111
example2 014584111
example3 +1 156151561
example4 564513212
Upvotes: 0
Views: 90
Reputation: 1884
Of course, this can be done. If You can describe the process in human language, You also can program it.
Example :
<string>
and </string>
So - You need now to make some decisions like :
Is the import file huge ? Then it will probably not fit into the memory, and we need to process line by line. Or will it fit in memory ?
Will this program be needed many times ? Or is it just a one-time conversion ?
Then You can divide the problems in smaller sub problems, and write some tests for each sub-problem.
You need also consider more circumstances like file size, if is it a one-time script, if there should be error checking (what if there are two intended lines ?) etc.
Upvotes: 0
Reputation: 23815
below (do what you need to do with data
)
import xml.etree.ElementTree as ET
def is_phone_number(value):
for x in value:
if x != '+' and x != ' ' and not x.isnumeric():
return False
return True
xml = '''<r> <string>example1</string>
<string>014584111</string>
<string>example2</string>
<string>04561212123</string>
<string>example3</string>
<string>+1 156151561</string>
<string>example4</string>
<string>564513212</string>
<string>example3, </string>
<string>example4 </string></r>'''
data = []
root = ET.fromstring(xml)
strings = root.findall('.//string')
i = 0
while i < len(strings):
if is_phone_number(strings[i+1].text):
data.append({'key': strings[i].text,'value':strings[i+1].text})
i += 2
print(data)
output
[{'key': 'example1', 'value': '014584111'}, {'key': 'example2', 'value': '04561212123'}, {'key': 'example3', 'value': '+1 156151561'}, {'key': 'example4', 'value': '564513212'}]
Upvotes: -1