Reputation: 829
I have some files which have tree like structure. For example:
A
Result
a11
a12
Lolim
a21
a22
Uplim
a31
a32
B
Result
b11
b12
Lolim
b21
b22
I am interested in parsing this files in order to obtain a dataframe which looks like this:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
My idea was to split somehow the file in two parts: A and B. And after that split each one in subcategories. For A would be Result, Lolim and Uplim and for B Result and Lolim. Finally each subcategory in 2 parts. Therefore I will end up with a nested list, and than I will be able to create a dataframe. But I don't know how to obtain this nested list.
Or is there another method for this? Can you recommend me modules or functions which can be useful?
Upvotes: 3
Views: 326
Reputation: 566
import collections
import pandas as pd
with open("data_tree.dat", "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line: # single space
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line: # 4 spaces
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line: # 6 spaces
item = line.strip()
dct[key][sub_key] = item # overwrite, last element only
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]] # if column order matters
df = df.fillna("NA") # in case you want NA and not NaN
print(df)
Output:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
This assumes that data_tree.dat
looks like this and is contained within the same folder as the .py
file containing the above code.
Or as a function:
import collections
import pandas as pd
def dat_to_df(path_to_file):
with open(path_to_file, "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line:
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line:
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line:
item = line.strip()
dct[key][sub_key] = item
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]]
return df.fillna("NA")
dataframe = dat_to_df("data_tree.dat")
print(dataframe)
Upvotes: 2