scott martin
scott martin

Reputation: 1293

Pandas - Exploding data into multiple rows and creating a new DataFrame

I have a DataFrame that looks as below:

id| products| quantity| price| time
101| prod_a, prod_b| 2, 1| 10,5| 2020-01-01, 2020-01-03
102| prod_a| 2| 5| 2020-01-03
103| prod_c| 21| 32| 2020-03-02

I have each column that has a particular metric value stored. I am trying to break this such that I can get the below structure:

id, products, quantity, price, time
101, prod_a, 2, 10, 2020-01-01 
101, prod_b, 1, 5, 2020-01-03 
102, prod_a, 2, 5, 2020-01-03
103, prod_c, 21, 32, 2020-03-02

I tried to convert each of the column to a Pandas series as below and then try concat into a master DataFrame as below:

s1 = df.products.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.quantity.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s3 = df.price.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s4 = df. time.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)


df = pd.concat([s1,s2,s3, s4], axis=1, keys=['products', 'quantity', 'price', 'time]).reset_index()

However doing the above throws the below error

ValueError: cannot reindex from a duplicate axis

Upvotes: 0

Views: 59

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

One way is to do it like this, using pd.Series.explode:

df.set_index('id').apply(lambda x: x.str.split(',').explode()).reset_index()

Output:

    id  products  quantity  price         time
0  101    prod_a         2     10   2020-01-01
1  101    prod_b         1      5   2020-01-03
2  102    prod_a         2      5   2020-01-03
3  103    prod_c        21     32   2020-03-02

Upvotes: 3

Related Questions