Reputation: 965
I have the following dataframe:-
import pandas as pd
df = pd.read_csv('filename.csv')
print(df)
date organic paid source_type
4/1/2018 39911909.19 38575924.75 Search
4/1/2018 5085939.952 882.608927 Social
4/1/2018 16227439.73 0 Mail
4/1/2018 0 5671871.24 Display Ads
4/1/2018 91215520.23 0 Direct
4/1/2018 15743479.56 0 Referrals
I want to add a column total_sum for all the source types except when source type is "Search". If the source_type is search I want to break down the single row into two and source type becomes organic search and paid search. Inshort a df like below. The summing part is easy to handle i am just stuck with the breaking of rows and conditional column prefix part. Dataframe I need:-
date source_type Total Sum
4/1/2018 Organic Search 39911909.19
4/1/2018 Paid Search 38575924.75
4/1/2018 Social 5086822.561
4/1/2018 Mail 16227439.73
4/1/2018 Display Ads 5671871.24
4/1/2018 Direct 91215520.23
4/1/2018 Referrals 15743479.56
Upvotes: 3
Views: 90
Reputation: 863166
You can split DataFrame by boolean indexing
with Series.eq
for ==
, then reshape first by DataFrame.melt
with new column with Series.str.capitalize
, filter second by invert mask by ~
, sum values with DataFrame.pop
for remove column after and last use concat
:
mask = df['source_type'].eq('Search')
df1 = df[mask].melt(['date','source_type'], value_name='Total Sum')
df1['source_type'] = df1.pop('variable').str.capitalize() + ' Search'
df2 = df[~mask].copy()
df2['Total Sum'] = df2.pop('organic').add(df2.pop('paid'))
df = pd.concat([df1, df2], ignore_index=True)
print (df)
date source_type Total Sum
0 4/1/2018 Organic Search 3.991191e+07
1 4/1/2018 Paid Search 3.857592e+07
2 4/1/2018 Social 5.086823e+06
3 4/1/2018 Mail 1.622744e+07
4 4/1/2018 Display Ads 5.671871e+06
5 4/1/2018 Direct 9.121552e+07
6 4/1/2018 Referrals 1.574348e+07
Upvotes: 3