Reputation: 167
I have a dataframe df1 containing two columns freq and RN with the data sorted according to ascending order by freq.
In [2]: df1.head()
Out[2]:
freq RN
147 1 181
56 1 848
149 1 814
25 1 829
I want to plot a scatter plot with X axis as RN and y axis as freq where the X values are arranged in ascending order of the y values ie. I want the x axis to be arranged as 841,848,835,... as given in df1 which has been sorted according to ascending order of freq values.
Now if I write plt.scatter('RN', 'freq',data=df1)
the output x axis I get is not sorted by the ascending order of freq. It is arranged in its own natural ascending order like 800,801,...,860.
Note: plt.bar('RN', 'freq',data=df1)
works in the correct way as I want.
How Do I change it?
Upvotes: 1
Views: 8710
Reputation: 62403
RN
column is numeric, the plot API will sort it numerically.RN
column type to str
.
RN
are unique. If they are not unique, all the freq
values for a non-unique RN
will be plotted together.RN
is not unique, there's no way for the plot API to differential one value from another.import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.random.randint(800, 900, size=(20,))}
df = pd.DataFrame(data)
# convert RN to a str type
df.RN = df.RN.astype(str)
# sort freq
df.sort_values('freq', ascending=True, inplace=True)
# plot
plt.scatter('RN', 'freq', data=df)
pandas.DataFrame.groupby
to group non-unique RNs together# create test data
# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.random.randint(800, 900, size=(20,))}
df = pd.DataFrame(data)
# convert RN to a str type
df.RN = df.RN.astype(str)
# combine non-unique RN with groupby and sort by freq
dfg = df.groupby('RN', as_index=False)['freq'].sum().sort_values('freq')
# plot
plt.scatter('RN', 'freq', data=dfg)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create test data
np.random.seed(365)
data = {'freq': np.random.randint(20, size=(20,)), 'RN': np.arange(800, 820)}
df = pd.DataFrame(data)
# convert RN to a str type
df.RN = df.RN.astype(str)
# sort `freq`
df.sort_values('freq', ascending=True, inplace=True)
# plot
plt.scatter('RN', 'freq', data=df)
Upvotes: 1