peace
peace

Reputation: 389

pandas- how to calculate the percentage for each sankey nodes

I exported some raw data from splunk, and I have transform the raw data to source, target and value based on dc, customer and companyID column by using python pandas, Below is part of sample data.

dc,customer,companyID,source,target,value

I am using powerBI to display the sankey chart by creating a html custom visual with networkD3 library. So i want to make all the data ready in a csv file locally and then upload the powerBI to avoid any data computing in powerBI.

So how to calculate the percentage by python pandas like below chart? Basically, the percentage value of the source node should be equal to the sum percentage of each target nodes.

The percentage I want looks like below enter image description here

Upvotes: 0

Views: 1834

Answers (1)

keramat
keramat

Reputation: 4543

Use:

    temp = [['dc1','c1','cid1','view_page_mk_1','open_opp_2',10],
    ['dc2','c2','cid2','view_page_mk_1','open_opp_2',5],
    ['dc1','c1','cid1','view_page_as_1','open_opp_2',10],
    ['dc2','c1','cid1','view_page_cp_1','view_opp_2',33],
    ['dc3','c3','cid3','open_opp_2','show_more_3',8],]

df = pd.DataFrame(temp)
df.columns = ['dc','customer','companyID','source','target','value']
df['percent'] = df['value']/df['value'].sum()
df.groupby('source')['percent'].sum()

output:

enter image description here

Upvotes: 0

Related Questions