chhibbz
chhibbz

Reputation: 480

Pandas Pyplot: Counting columns for scatter plot

I have a data frame with following columns:

df = pd.read_csv('edtech.csv')
print(df.head())

   Unnamed: 0                                         Title      Date Country  \
0           3     Apple acquires edtech company LearnSprout  15-01-16      US   
1           9  LearnLaunch Accelerator launches new program  15-01-16      US   
2          15                   Flex Class raises financing  15-01-16   India   
3          16               Grovo raises Series C financing  15-01-16      US   
4          17                    Myly raises seed financing  15-01-16   India   

                          Segment  
0             Tools for Educators  
1     Accelerators and Incubators  
2  Adult and Continuing Education  
3               Platforms and LMS  
4                     Mobile Apps  
>>> 

Now, I want to create a scatter plot by mapping 'Country' on one axis and 'Segment' on another. E.g. for US and 'Tools for Educator', there will be one dot on the chart.

How do I convert this dataframe, so that I have numbers, which I can render into a scatter plot? I am able to get the chart in Tableau through count, but don't know the exact working behind the same.

Would be grateful if anyone can help me out. TIA

Upvotes: 0

Views: 1447

Answers (1)

Khris
Khris

Reputation: 3212

I don't know if the possibility of creating a scatter plot with two non-numerical categorical variables exists, the closest I could get to the kind of thing you want is creating counts with groupby, reshaping the data with pivot, and making a heatmap using seaborn:

import pandas as pd
import seaborn as sns

df = pd.read_csv('edtech.csv')
dd = df[['Country','Segment','Title']]
gg = dd.groupby(['Country','Segment'],as_index=False).count().rename(columns={"Title":"Number"})
gp = gg.pivot(columns="Segment",index="Country",values="Number").fillna(0.0)
sns.heatmap(gp,cbar=False)

Upvotes: 1

Related Questions