Reputation: 808
I'm trying to create a scatterplot in excel with the following data:
import pandas as pd
import seaborn as sns
import numpy as np
df = pd.DataFrame(data=np.random.random((100, 2)), columns=['X', 'Y'])
df['category'] = np.random.choice(['groupA', 'groupB', 'groupC'], size=len(df))
df.head()
X Y category
0 0.407807 0.269535 groupB
1 0.658499 0.671735 groupB
2 0.071938 0.199920 groupC
3 0.197185 0.358241 groupA
4 0.033922 0.917605 groupB
In seaborn I would just do something like:
sns.scatterplot(x='X', y='Y', data=df, hue='category')
But I am unable to create this is in excel. Do I really have to create a new column for each category?
Upvotes: 0
Views: 1482
Reputation: 11978
Not sure if there is a better way to do this in Excel but you need to resume your data using a Pivot Table, and then copy/paste the output as range somewhere else and then do your scatterplot (You can't do an scatterplot directly taking a Pivot Table as a source):
The setup of my Pivot Table is:
X
to rows sectionCategory
to Columns sectionY
to values section, set it to Sum
Anyways this method got an issue: If there are 2 or more points in same category with same X value but different Y values, then Pivot Table will sum them up and that's an error.
What you need to create the scatterlot is a source with the structure showed in step 3. It there is any kind od semiduplicates as mentioned before, I'm afraid you'll need to construct manually (or VBA code) your dataframe with this structure:
Do I really have to create a new column for each category? I'm afraid that yes, you do
Upvotes: 1