Reputation: 11
I have a csv file contain the following:
and need to parse it and the expected results:
Data in text:
id,product_id
1,[{'p_id': 59, 'p_name': 'IPF'}, {'p_id': 63, 'p_name': 'RBC'}, {'p_id': 47, 'p_name': 'CSP'}]
2,[{'p_id': 25, 'p_name': 'LPP'}, {'p_id': 86, 'p_name': 'CRS'}, {'p_id': 47, 'p_name': 'CSP'}]
3,[{'p_id': 73, 'p_name': 'OCC'}, {'p_id': 63, 'p_name': 'RBC'}]
4,[{'p_id': 63, 'p_name': 'RBC'}, {'p_id': 31, 'p_name': 'SUT'}, {'p_id': 73, 'p_name': 'OCC'}]
5,[{'p_id': 63, 'p_name': 'RBC'}]
Upvotes: 0
Views: 1516
Reputation: 18416
As I have already mentioned in the comment, the data you have doesn't have string values enclosed inside quote, for e.g. : in [{'p_id': 59, 'p_name': IPF}
, the value IPF
is not enclosed by quote, so you can not use any direct method.
Out of several way, the easiest way is to use yaml
(pip install pyyaml
) package to parse those string values as Python object, then explode
and apply pd.Series
:
import pandas as pd
import yaml
filePath = 'file.csv'
df = pd.read_csv(filePath, index_col=0)
out = (df['product_id'].apply(lambda x: yaml.load(x, yaml.Loader))
.explode()
.apply(pd.Series)
)
OUTPUT
>>> out
p_id p_name
id
1 59 IPF
1 63 RBC
1 47 CSP
2 25 LPP
2 86 CRS
2 47 CSP
3 73 OCC
3 63 RBC
4 63 RBC
4 31 SUT
4 73 OCC
5 63 RBC
Upvotes: 1