Reputation: 11
I'm trying to use plotly to plot a sankey diagram of how my customers move between sales buckets each week. Right now, my dataframe looks like this:
account_id | week | sales_bucket
1 | 2019-06-10 | $0-20
1 | 2019-06-17 | $21-40
1 | 2019-06-24 | $0-20
2 | 2019-06-10 | $40-60
2 | 2019-06-17 | $40-60
3 | 2019-06-17 | $60-80
4 | 2019-06-10 | $40-60
Essentially I want to track if customers are moving into higher buckets week over week. But not every customer makes a transaction every week, and of course there are new customers every week as well. I'm having trouble wrapping my head around how I can quickly get my dataframe into the source-sink-count
structure that plotly requires from the dataframe that I have above.
Upvotes: 0
Views: 1666
Reputation: 565
Assuming that yours links should be something like:
label = [
"Week 1 - Bucket 1",
"Week 1 - Bucket 2",
"Week 1 - Bucket 3",
"Week 2 - Bucket 1",
...
]
Then source
and target
(flow from source label to target label index) are in principle:
source = [0, 0, 0, 1, 1, 1, 2, 2, 2, ...]
target = [3, 4, 5, 3, 4, 5, 3, 4, 5, ...]
The last bit is to calculate the weights in the links. For that you can use group_by
and agg
from pandas
:
value = df.groupby(['Week', 'Sales Bucket'])['Id'].agg('sum').values.flatten()
Upvotes: 1