Reputation: 480
I have a data frame with following columns:
df = pd.read_csv('edtech.csv')
print(df.head())
Unnamed: 0 Title Date Country \
0 3 Apple acquires edtech company LearnSprout 15-01-16 US
1 9 LearnLaunch Accelerator launches new program 15-01-16 US
2 15 Flex Class raises financing 15-01-16 India
3 16 Grovo raises Series C financing 15-01-16 US
4 17 Myly raises seed financing 15-01-16 India
Segment
0 Tools for Educators
1 Accelerators and Incubators
2 Adult and Continuing Education
3 Platforms and LMS
4 Mobile Apps
>>>
Now, I want to create a scatter plot by mapping 'Country' on one axis and 'Segment' on another. E.g. for US and 'Tools for Educator', there will be one dot on the chart.
How do I convert this dataframe, so that I have numbers, which I can render into a scatter plot? I am able to get the chart in Tableau through count, but don't know the exact working behind the same.
Would be grateful if anyone can help me out. TIA
Upvotes: 0
Views: 1447
Reputation: 3212
I don't know if the possibility of creating a scatter plot with two non-numerical categorical variables exists, the closest I could get to the kind of thing you want is creating counts with groupby
, reshaping the data with pivot
, and making a heatmap
using seaborn
:
import pandas as pd
import seaborn as sns
df = pd.read_csv('edtech.csv')
dd = df[['Country','Segment','Title']]
gg = dd.groupby(['Country','Segment'],as_index=False).count().rename(columns={"Title":"Number"})
gp = gg.pivot(columns="Segment",index="Country",values="Number").fillna(0.0)
sns.heatmap(gp,cbar=False)
Upvotes: 1