Reputation: 97
I have this list scraped from the website, I want to convert this to pandas dataframe with columns:
ID,name,PMID,Source,Family,Value,weight
and appropriate values in each row with their associated id
[['ID'],
['243'],
['Name'],
['ABC'],
['PMID'],
['--NA--'],
['Source'],
['ABC'],
['Family'],
['XYZ'],
['Value'],
['45'],
['Weight'],
['200'],
['ID'],
['246'],
['Name'],
['PaS'],
['PMID'],
['--NA--'],
['Source'],
['APM'],
['Family'],
['Lau'],
['Value'],
['120'],
['Weight'],
['856'],
...]
Kindly help me with this. Thanks.
Upvotes: 1
Views: 48
Reputation: 75080
You can flatten the list first and then use cumulative sum based on your expected columns to kinda pivot them, note that l
is the name of the input list.
import itertools
l1 = ['ID','Name','PMID','Source','Family','Value','Weight']
s = pd.Series(itertools.chain.from_iterable(l))
s1 = s.map(dict(zip(*pd.factorize(l1)[::-1]))).ffill()
out = (pd.concat([pd.Series(g.dropna().iloc[1:],name=g.iloc[0]).reset_index(drop=True)
for i,g in s.mask(s.duplicated() & s.isin(l1)).groupby(s1)],axis=1))
print(out)
ID Name PMID Source Family Value Weight
0 243 ABC --NA-- ABC XYZ 45 200
1 246 PaS --NA-- APM Lau 120 856
Upvotes: 2
Reputation: 153460
You can try this:
import pandas as pd
res = [['ID'],
['243'],
['Name'],
['ABC'],
['PMID'],
['--NA--'],
['Source'],
['ABC'],
['Family'],
['XYZ'],
['Value'],
['45'],
['Weight'],
['200'],
['ID'],
['246'],
['Name'],
['PaS'],
['PMID'],
['--NA--'],
['Source'],
['APM'],
['Family'],
['Lau'],
['Value'],
['120'],
['Weight'],
['856']]
dfi = pd.DataFrame(np.array(res).reshape(-1,2))
grp = (dfi[0] == 'ID').cumsum()
dfi.set_index([grp, 0])[1].unstack()
Output:
0 Family ID Name PMID Source Value Weight
0
1 XYZ 243 ABC --NA-- ABC 45 200
2 Lau 246 PaS --NA-- APM 120 856
Upvotes: 2