remus2232
remus2232

Reputation: 87

Pandas - split columns and count occurrences

It's info for some purchases made by clients on phone accessories, my real data would look something like this: Abstract Model 1 ~Samsung S6 | Sold: 4

I've got a dataset that looks something like this:

item               sold
Design1 ~Model1 1
Design2 ~Model1 2
Design1 ~Model2 3
Design2 ~Model2 1

I want to break the item column into 2 columns , design and model, and count each time a design has been sold, and a model has been sold, individually, based on the selling data of design+model combinations in the input.

My expected output, based on the first dataset, would look something like this:

expected output:        

design    design_sold    model      model_sold

Design1   4              Model1     3
Design2   3              Model2     4

Upvotes: 1

Views: 1171

Answers (1)

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

try this,

df[['Design','Model']]=df['item'].str.split(' ~',expand=True)
print pd.concat([df.groupby('Design',as_index=False)['sold'].sum().rename(columns={'sold':'Desgin Sold'}),df.groupby('Model',as_index=False)['sold'].sum().rename(columns={'sold':'Model Sold'})],axis=1)

Output:

    Design  Desgin Sold   Model  Model Sold
0  Design1            4  Model1           3
1  Design2            3  Model2           4

Explanation:' 1. .str.split() used to split your series into frame.

  1. groupby model and design and perform sum on grouped object.

  2. rename the column and concat your dataframe.

Upvotes: 1

Related Questions