Reputation: 173
So the data frame is
Product Price Weight Range Count
A 40 20 1-3 20
A 40 20 4-7 23
B 20 73 1-3 54
B 20 73 4-7 43
B 20 73 8-15 34
B 20 73 >=16 12
C 10 20 4-7 22
So basically there is a product with price and weight and the range here specifies the no of days the product was sold continuously and the count specifies the count of products sold in that range
Expected Output
Product Price Weight Range Count
A 40 20 1-3 20
4-7 23
B 20 73 1-3 54
4-7 43
8-15 34
B 20 73 >=16 12
C 10 20 4-7 22
or
Product Price Weight 1-3 4-7 8-15 >=16
A 40 20 20 23 NaN NaN
B 20 73 54 43 34 1
C 10 20 0 22 NaN NaN
Upvotes: 2
Views: 116
Reputation: 862891
In my opinion first solution is not recommended if need processes DataFrame
later.
Second solution is much better and if duplicates in real data is necessary aggregate values, e.g. by sum
:
#convert catagoricals to strings
df['Range'] = df['Range'].astype(str)
df = (df.groupby(['Product', 'Price', 'Weight', 'Range'])['Count']
.sum()
.unstack(fill_value=0)
.reset_index())
print (df)
Range Product Price Weight 1-3 4-7 8-15 >=16
0 A 40 20 20 23 0 0
1 B 20 73 54 43 34 12
2 C 10 20 0 22 0 0
Upvotes: 2
Reputation: 11192
try this,
mask=df.duplicated(subset=['Product'])
df.loc[mask,['Product','Price','Weight']]=''
Output:
Product Price Weight Range Count
0 A 40 20 1-3 20
1 4-7 23
2 B 20 73 1-3 54
3 4-7 43
4 8-15 34
5 >=16 12
6 C 10 100 4-7 22
.
pd.pivot_table(df,index=['Product','Price','Weight'],columns='Range',values='Count').reset_index()
Output:
Range Product Price Weight 1-3 4-7 8-15 >=16
0 A 40 20 20.0 23.0 NaN NaN
1 B 20 73 54.0 43.0 34.0 12.0
2 C 10 100 NaN 22.0 NaN NaN
Upvotes: 1
Reputation: 402633
Fulfilling the second output makes more sense than the first. Use set_index
, followed by unstack
.
(df.set_index(['Product', 'Price', 'Weight', 'Range'])
.Count
.unstack(fill_value=0)
.reset_index()
)
Range Product Price Weight 1-3 4-7 8-15 >=16
0 A 40 20 20 23 0 0
1 B 20 73 54 43 34 12
2 C 10 100 0 22 0 0
Upvotes: 3