Reputation: 1096
I want to use the UpSetPlot given the actual sets I have, but I cannot find any example to use it this way. The standard example is this:
from upsetplot import generate_counts, plot
example = generate_counts()
plot(example, orientation='vertical')
where generated example
is a Series
looking like below.
cat0 cat1 cat2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64
Is there a way to automatically generate this kind of count structure from the actual elements in the categories cat0
, cat1
, and cat2
?
Upvotes: 3
Views: 4280
Reputation: 3643
There are several ways that sets can be used to represent category membership. To help translate sets into the format required by upsetplot
, you will find helpers from_memberships
, from_contents
and from_indicators
.
See also the Data Format Guide.
Upvotes: 1
Reputation: 1096
Using the tip by @StupidWolf in another answer, here is an answer to my own question. Given 3 sets
set1 = {0,1,2,3,4,5}
set2 = {3,4,5,6,10}
set3 = {0,5,6,7,8,9}
here is the code to draw an upsetplot for these three sets:
import pandas as pd
from upsetplot import plot
set_names = ['set1', 'set2', 'set3']
all_elems = set1.union(set2).union(set3)
df = pd.DataFrame([[e in set1, e in set2, e in set3] for e in all_elems], columns = set_names)
df_up = df.groupby(set_names).size()
plot(df_up, orientation='horizontal')
And here is the 4th and 5th line changed to generalize above code to a list of sets, say sets = [set1, set2, set3]
:
all_elems = list(set().union(*sets))
df = pd.DataFrame([[e in st for st in sets] for e in all_elems], columns = set_names)
Upvotes: 3
Reputation: 46908
It looks like a product from pandas to me:
import numpy as np
import pandas as pd
from upsetplot import generate_counts, plot
example = generate_counts()
type(example)
pandas.core.series.Series
example.index
MultiIndex([(False, False, False),
(False, False, True),
(False, True, False),
(False, True, True),
( True, False, False),
( True, False, True),
( True, True, False),
( True, True, True)],
names=['cat0', 'cat1', 'cat2'])
So if your dataframe is like this:
df = pd.DataFrame(np.random.choice([True,False],(100,3)),
columns=['cat0','cat1','cat2'])
You can do:
example = df.groupby(['cat0','cat1','cat2']).size()
plot(example, orientation='vertical')
I think the limitation is that the elements in cat0, cat1, cat2 have to be boolean.
Upvotes: 1