Reputation: 11
I have a file with data similar to this:
[START]
Name = Peter
Sex = Male
Age = 34
Income[2020] = 40000
Income[2019] = 38500
[END]
[START]
Name = Maria
Sex = Female
Age = 28
Income[2020] = 43000
Income[2019] = 42500
Income[2018] = 40000
[END]
[START]
Name = Jane
Sex = Female
Age = 41
Income[2020] = 60500
Income[2019] = 57500
Income[2018] = 54000
[END]
I want to read this data into a pandas dataframe so that at the end it is similar to this
Name Sex Age Income[2020] Income[2019] Income[2018]
Peter Male 34 40000 38500 NaN
Maria Female 28 43000 42500 40000
Jane Female 41 60500 57500 54000
So far, I wasn't able to figure out if this is a standard data file format (it has some similarities to JSON but is still very different). Is there an elegant and fast way to read this data to a dataframe?
Upvotes: 0
Views: 127
Reputation: 149065
Elegant I do not know, but easy way, yes. Python is very good at parsing simple formatted text.
Here, [START]
starts a new record, [END]
ends it, and inside a record, you have key = value
lines. You can easily build a custom parser to generate a list of records to feed into a pandas DataFrame:
inblock = False
fieldnames = []
data = []
for line in open(filename):
if inblock:
if line.strip() == '[END]':
inblock = False
elif '=' in line:
k, v = (i.strip() for i in line.split('=', 1))
record[k] = v
if not k in fieldnames:
fieldnames.append(k)
else:
if line.strip() == '[START]':
inblock = True
record = {}
data.append(record)
df = pd.DataFrame(data, columns=fieldnames)
df
is as expected:
Name Sex Age Income[2020] Income[2019] Income[2018]
0 Peter Male 34 40000 38500 NaN
1 Maria Female 28 43000 42500 40000
2 Jane Female 41 60500 57500 54000
Upvotes: 3