Reputation: 1293
I have a DataFrame that looks as below:
id| products| quantity| price| time
101| prod_a, prod_b| 2, 1| 10,5| 2020-01-01, 2020-01-03
102| prod_a| 2| 5| 2020-01-03
103| prod_c| 21| 32| 2020-03-02
I have each column that has a particular metric value stored. I am trying to break this such that I can get the below structure:
id, products, quantity, price, time
101, prod_a, 2, 10, 2020-01-01
101, prod_b, 1, 5, 2020-01-03
102, prod_a, 2, 5, 2020-01-03
103, prod_c, 21, 32, 2020-03-02
I tried to convert each of the column to a Pandas series as below and then try concat into a master DataFrame as below:
s1 = df.products.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.quantity.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s3 = df.price.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s4 = df. time.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
df = pd.concat([s1,s2,s3, s4], axis=1, keys=['products', 'quantity', 'price', 'time]).reset_index()
However doing the above throws the below error
ValueError: cannot reindex from a duplicate axis
Upvotes: 0
Views: 59
Reputation: 153460
One way is to do it like this, using pd.Series.explode
:
df.set_index('id').apply(lambda x: x.str.split(',').explode()).reset_index()
Output:
id products quantity price time
0 101 prod_a 2 10 2020-01-01
1 101 prod_b 1 5 2020-01-03
2 102 prod_a 2 5 2020-01-03
3 103 prod_c 21 32 2020-03-02
Upvotes: 3