data_coder
data_coder

Reputation: 97

How to convert such list of values to dataframe with columns out of that?

I have this list scraped from the website, I want to convert this to pandas dataframe with columns:

ID,name,PMID,Source,Family,Value,weight
and appropriate values in each row with their associated id

[['ID'],
 ['243'],
 ['Name'],
 ['ABC'],
 ['PMID'],
 ['--NA--'],
 ['Source'],
 ['ABC'],
 ['Family'],
 ['XYZ'],
 ['Value'],
 ['45'],
 ['Weight'],
 ['200'],
 ['ID'],
 ['246'],
 ['Name'],
 ['PaS'],
 ['PMID'],
 ['--NA--'],
 ['Source'],
 ['APM'],
 ['Family'],
 ['Lau'],
 ['Value'],
 ['120'],
 ['Weight'],
 ['856'],
...]

Kindly help me with this. Thanks.

Expected Output: Expected output

Upvotes: 1

Views: 48

Answers (2)

anky
anky

Reputation: 75080

You can flatten the list first and then use cumulative sum based on your expected columns to kinda pivot them, note that l is the name of the input list.

import itertools
l1 = ['ID','Name','PMID','Source','Family','Value','Weight']

s = pd.Series(itertools.chain.from_iterable(l))

s1 = s.map(dict(zip(*pd.factorize(l1)[::-1]))).ffill()

out = (pd.concat([pd.Series(g.dropna().iloc[1:],name=g.iloc[0]).reset_index(drop=True)
        for i,g in s.mask(s.duplicated() & s.isin(l1)).groupby(s1)],axis=1))

print(out)

    ID Name    PMID Source Family Value Weight
0  243  ABC  --NA--    ABC    XYZ    45    200
1  246  PaS  --NA--    APM    Lau   120    856

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

You can try this:

import pandas as pd

res = [['ID'],
 ['243'],
 ['Name'],
 ['ABC'],
 ['PMID'],
 ['--NA--'],
 ['Source'],
 ['ABC'],
 ['Family'],
 ['XYZ'],
 ['Value'],
 ['45'],
 ['Weight'],
 ['200'],
 ['ID'],
 ['246'],
 ['Name'],
 ['PaS'],
 ['PMID'],
 ['--NA--'],
 ['Source'],
 ['APM'],
 ['Family'],
 ['Lau'],
 ['Value'],
 ['120'],
 ['Weight'],
 ['856']]

dfi = pd.DataFrame(np.array(res).reshape(-1,2))
grp = (dfi[0] == 'ID').cumsum()
dfi.set_index([grp, 0])[1].unstack()

Output:

0 Family   ID Name    PMID Source Value Weight
0                                             
1    XYZ  243  ABC  --NA--    ABC    45    200
2    Lau  246  PaS  --NA--    APM   120    856

Upvotes: 2

Related Questions