Reputation: 45
Reposting it as this was incorrectly marked as duplicated. This post was linked but does not answer my issue
I am new to Python, I have a text string that looks like this. I need help in converting it to a table. I tried doing it by creating dictionaries, however, the number of columns in each row are not always the same, this is creating an issue. Also, there are columns like "stock" in the text, that I do not need in the final output
After removing empty lines and other info. the text file looks like this.
XYZ
XYZ
ABC
ABC
MNP
MNP
Fruit
Apple
price
30
Number
10
Fruit
kiwi
stock
10
Number
20
Fruit
grape
price
12
This is my desired output in a table format, the second row should have empty value left for price third row should have an empty value for Number.
Fruit price Number
Apple 30 10
kiwi 20
grape 12
Upvotes: 1
Views: 281
Reputation: 23089
Here's my solution that I wanted to put on the first version of this question, in case you don't want to use Pandas:
#!/usr/bin/env python
import re
data = """
XYZ
XYZ
ABC
ABC
MNP
MNP
Fruit
Apple
price
30
Number
10
Fruit
kiwi
Number
20
Fruit
grape
price
12"""
def doit(data):
table = []
data = re.split(r'\s+', data)
currentFruit = None
while len(data):
line = data.pop(0)
if line == "Fruit":
if currentFruit:
table.append(currentFruit)
currentFruit = { 'name': data.pop(0) }
elif currentFruit:
currentFruit[line] = data.pop(0)
table.append(currentFruit)
print "%-9s%-9s%-9s" % ("Fruit", "price", "Number")
for fruit in table:
print "%-9s%-9s%-9s" % (fruit['name'],
fruit['price'] if 'price' in fruit else '',
fruit['Number'] if 'Number' in fruit else '')
doit(data)
Upvotes: 0
Reputation: 8270
You can use pandas
to create such table:
import pandas as pd
text = '''XYZ
XYZ
ABC
ABC
MNP
MNP
Fruit
Apple
price
30
Number
10
Fruit
kiwi
Number
20
Fruit
grape
price
12'''
data = {'Fruit': [], 'price': [], 'Number': []}
lines = text.split()
for i in range(len(lines)):
if i+5 < len(lines) and lines[i] == 'Fruit' and lines[i+2] == 'price' and lines[i+4] == 'Number':
data['Fruit'].append(lines[i+1])
data['price'].append(lines[i+3])
data['Number'].append(lines[i+5])
elif i+3 < len(lines) and lines[i] == 'Fruit' and lines[i+2] == 'Number':
data['Fruit'].append(lines[i+1])
data['price'].append('')
data['Number'].append(lines[i+3])
elif i+3 < len(lines) and lines[i] == 'Fruit' and lines[i+2] == 'price':
data['Fruit'].append(lines[i+1])
data['price'].append(lines[i+3])
data['Number'].append('')
df = pd.DataFrame(data)
print(df)
Result:
Fruit price Number
0 Apple 30 10
1 kiwi 20
2 grape 12
Also you can save result to CSV:
df.to_csv('result.csv')
Upvotes: 1