SeekKnowledge
SeekKnowledge

Reputation: 11

How to parse data from column in python pandas

I have a csv file contain the following:

original

and need to parse it and the expected results:

result

Data in text:

id,product_id
    
1,[{'p_id': 59, 'p_name': 'IPF'}, {'p_id': 63, 'p_name': 'RBC'}, {'p_id': 47, 'p_name': 'CSP'}]

2,[{'p_id': 25, 'p_name': 'LPP'}, {'p_id': 86, 'p_name': 'CRS'}, {'p_id': 47, 'p_name': 'CSP'}]

3,[{'p_id': 73, 'p_name': 'OCC'}, {'p_id': 63, 'p_name': 'RBC'}]

4,[{'p_id': 63, 'p_name': 'RBC'}, {'p_id': 31, 'p_name': 'SUT'}, {'p_id': 73, 'p_name': 'OCC'}]

5,[{'p_id': 63, 'p_name': 'RBC'}]

Upvotes: 0

Views: 1516

Answers (1)

ThePyGuy
ThePyGuy

Reputation: 18416

As I have already mentioned in the comment, the data you have doesn't have string values enclosed inside quote, for e.g. : in [{'p_id': 59, 'p_name': IPF}, the value IPF is not enclosed by quote, so you can not use any direct method.

Out of several way, the easiest way is to use yaml (pip install pyyaml) package to parse those string values as Python object, then explode and apply pd.Series:

import pandas as pd
import yaml

filePath = 'file.csv'
df = pd.read_csv(filePath, index_col=0)
out = (df['product_id'].apply(lambda x: yaml.load(x, yaml.Loader))
        .explode()
        .apply(pd.Series)
       )

OUTPUT

>>> out
    p_id p_name
id             
1     59    IPF
1     63    RBC
1     47    CSP
2     25    LPP
2     86    CRS
2     47    CSP
3     73    OCC
3     63    RBC
4     63    RBC
4     31    SUT
4     73    OCC
5     63    RBC

Upvotes: 1

Related Questions