Reputation: 1587

Create line plot with 2 series splitted by column value

I'm fighting with should be quite an easy task. Creation of line plot with 2 series. So far I managed to do so but I think it is not the fastest way. I wanted to ask if anyone knows how to do it faster/smarter?

The problem which I have is that values of this 2 series are in the same column 'values' and to get series I should split them according to 'category' column. So far I manage to do so by doing few transformations before plotting it. It seems to be not the fastest solution. does anyone know a way to make this plot without transformations which I made in my code below?

My code:

import numpy.random as r
import pandas as pn

#generate values
values= r.random_sample(200)
labels = range(1,101)+range(1,101)
category = [x for x in 100*'a'+100*'b' ]

#create dataframe
df =pn.DataFrame({'labels': labels,
                 'values': values,
                  'category': category})

### I tired here to create plot but was unsuccessful so far. And needed to make below transformation.

#transformation
df =df.set_index('labels')

dfA= df[df['category']=='a']
del dfA['category']
dfA.columns=['values_a']

dfB=df[df['category']=='b']
del dfB['category']
dfB.columns=['values_b']

#joining
frames=[dfA,dfB]
dff= pn.concat(frames, axis=1)

#ploting
dff.plot()

Thank you in advance for help!

Upvotes: 3

Answers (3)

manu190466

Reputation: 1603

You can use seaborn to achieve this as a scatter plot :

import seaborn as sns                  
sns.lmplot('labels', 'values', data=df, hue='category')

If you prefer a line plot :

import seaborn as sns                  
sns.pointplot('labels', 'values', data=df, hue='category')

Upvotes: 1

Stop harming Monica

Reputation: 12590

You do have to transform your data since you do not want to plot your columns as they are. But there is an easier way:

>>> df.pivot(index='labels', columns='category', values='values').head()
category         a         b
labels                      
1         0.133046  0.762676
2         0.717739  0.774000
3         0.059960  0.547297
4         0.464269  0.951537
5         0.227428  0.987621
>>> df.pivot(index='labels', columns='category', values='values').plot()

Upvotes: 2

jezrael

Reputation: 862581

IIUC you can use concat with parameter keys as column names:

#transformation
df = df.set_index('labels')

dff = pn.concat([df.loc[df['category']=='a', 'values'],
                 df.loc[df['category']=='b', 'values']], 
        axis=1,  
        keys=['values_a', 'values_b'])

print dff
       values_a  values_b
labels                    
1       0.240131  0.083861
2       0.137078  0.788497
3       0.017947  0.985262
4       0.053830  0.882618
5       0.772023  0.753158
6       0.258116  0.322541
7       0.837611  0.188269
8       0.551581  0.599734
...          ...       ...
...          ...       ...
...          ...       ...
93      0.413466  0.794807
94      0.791670  0.186960
95      0.033857  0.070732
96      0.805209  0.570014
97      0.691454  0.125113
98      0.564201  0.104882
99      0.656381  0.176520
100     0.007758  0.340838

[100 rows x 2 columns]

EDIT: You can omit concat and then set legend by ax.legend:

import matplotlib.pyplot as plt

plt.figure()
df.loc[df['category']=='a', 'values'].plot()
ax = df.loc[df['category']=='b', 'values'].plot()
ax.legend(['values_a','values_b'])

Upvotes: 2

Create line plot with 2 series splitted by column value

Answers (3)

Related Questions