R Surdhar
R Surdhar

Reputation: 11

Don't understand how to plot my dataframe in a Sankey diagram

I'm trying to use plotly to plot a sankey diagram of how my customers move between sales buckets each week. Right now, my dataframe looks like this:

account_id |    week    | sales_bucket
     1     | 2019-06-10 |   $0-20
     1     | 2019-06-17 |   $21-40
     1     | 2019-06-24 |   $0-20
     2     | 2019-06-10 |   $40-60
     2     | 2019-06-17 |   $40-60
     3     | 2019-06-17 |   $60-80
     4     | 2019-06-10 |   $40-60

Essentially I want to track if customers are moving into higher buckets week over week. But not every customer makes a transaction every week, and of course there are new customers every week as well. I'm having trouble wrapping my head around how I can quickly get my dataframe into the source-sink-count structure that plotly requires from the dataframe that I have above.

Upvotes: 0

Views: 1666

Answers (1)

Assuming that yours links should be something like:

label = [
  "Week 1 - Bucket 1", 
  "Week 1 - Bucket 2", 
  "Week 1 - Bucket 3", 
  "Week 2 - Bucket 1", 
  ...
  ]

Then source and target (flow from source label to target label index) are in principle:

source = [0, 0, 0, 1, 1, 1, 2, 2, 2, ...]
target = [3, 4, 5, 3, 4, 5, 3, 4, 5, ...]

The last bit is to calculate the weights in the links. For that you can use group_by and agg from pandas:

value = df.groupby(['Week', 'Sales Bucket'])['Id'].agg('sum').values.flatten()

Upvotes: 1

Related Questions