Reputation: 389
I exported some raw data from splunk, and I have transform the raw data to source, target and value based on dc, customer and companyID column by using python pandas, Below is part of sample data.
dc,customer,companyID,source,target,value
I am using powerBI to display the sankey chart by creating a html custom visual with networkD3 library. So i want to make all the data ready in a csv file locally and then upload the powerBI to avoid any data computing in powerBI.
So how to calculate the percentage by python pandas like below chart? Basically, the percentage value of the source node should be equal to the sum percentage of each target nodes.
The percentage I want looks like below
Upvotes: 0
Views: 1834
Reputation: 4543
Use:
temp = [['dc1','c1','cid1','view_page_mk_1','open_opp_2',10],
['dc2','c2','cid2','view_page_mk_1','open_opp_2',5],
['dc1','c1','cid1','view_page_as_1','open_opp_2',10],
['dc2','c1','cid1','view_page_cp_1','view_opp_2',33],
['dc3','c3','cid3','open_opp_2','show_more_3',8],]
df = pd.DataFrame(temp)
df.columns = ['dc','customer','companyID','source','target','value']
df['percent'] = df['value']/df['value'].sum()
df.groupby('source')['percent'].sum()
output:
Upvotes: 0