shinchaan
shinchaan

Reputation: 136

mosaic plot with percentage and count values as labels in pandas DF

I have pandas dataframe like this:

     LEVEL_1      LEVEL_2    Freq  Percentage
0       HIGH          HIGH   8842      17.684
1    AVERAGE           LOW   2802       5.604
2        LOW           LOW  22198      44.396
3    AVERAGE       AVERAGE   6804      13.608
4        LOW       AVERAGE   2030       4.060
5       HIGH       AVERAGE   3666       7.332
6    AVERAGE          HIGH   2887       5.774
7        LOW          HIGH    771       1.542

I can get tiles of LEVEL_1 and LEVEL_2:

 from statsmodels.graphics.mosaicplot import mosaic
 mosaic(df, ['LEVEL_1','LEVEL_2'])

enter image description here
I just want to put Freq and Percentage at the center of each tile of mosaic plot. How can I do this?

Upvotes: 1

Views: 3403

Answers (1)

Avi
Avi

Reputation: 454

Here's a start. Note I had to add a row of zeros to the DataFrame for the labeling. You can make the labeling nicer by string formatting in the lambda function. You'll also want to reorder the headers.

import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
import io
d = io.StringIO()
d.write("""     LEVEL_1      LEVEL_2    Freq  Percentage\n
       HIGH          HIGH   8842      17.684\n
    AVERAGE           LOW   2802       5.604\n
        LOW           LOW  22198      44.396\n
    AVERAGE       AVERAGE   6804      13.608\n
        LOW       AVERAGE   2030       4.060\n
       HIGH       AVERAGE   3666       7.332\n
    AVERAGE          HIGH   2887       5.774\n
        LOW          HIGH    771       1.542""")
d.seek(0)
df = pd.read_csv(d, skipinitialspace=True, delim_whitespace=True)
df = df.append({'LEVEL_1': 'HIGH', 'LEVEL_2': 'LOW', 'Freq': 0, 'Percentage': 0}, ignore_index=True)
df = df.sort_values(['LEVEL_1', 'LEVEL_2'])
df = df.set_index(['LEVEL_1', 'LEVEL_2'])
print(df)

mosaic(df['Freq'], labelizer=lambda k: df.loc[k].values);

plot from a Jupyter notebook

Upvotes: 4

Related Questions