Vaibhav Singh
Vaibhav Singh

Reputation: 1209

Python Scatter Plot - with smooth Line

I have a dataframe as below (obtained after lot of preprocessing)

Please find dataframe

d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031',
           133: '180816_031', 99: '180816_031', 270: '180816_031', 19: '180816_031', 382: '180816_031',
           414: '180816_031', 267: '180816_031', 218: '180816_031', 398: '180816_031', 287: '180816_031',
           155: '180816_031', 392: '180816_031', 265: '180816_031', 239: '180816_031', 237: '180816_031'},
 'station': {361: 'deneb', 119: 'callisto', 101: 'callisto', 135: 'callisto', 292: 'callisto', 133: 'deneb',
             99: 'callisto', 270: 'callisto', 19: 'deneb', 382: 'callisto', 414: 'deneb', 267: 'callisto',
             218: 'deneb', 398: 'callisto', 287: 'deneb', 155: 'deneb', 392: 'deneb', 265: 'callisto',
             239: 'callisto', 237: 'callisto'},
 'cycle_number': {361: 'cycle09', 119: 'cycle06', 101: 'cycle04', 135: 'cycle01', 292: 'cycle04', 133: 'cycle05',
                  99: 'cycle06', 270: 'cycle07', 19: 'cycle04', 382: 'cycle08', 414: 'cycle04', 267: 'cycle10',
                  218: 'cycle07', 398: 'cycle08', 287: 'cycle09', 155: 'cycle08', 392: 'cycle06', 265: 'cycle02',
                  239: 'cycle09', 237: 'cycle07'},
 'variable': {361: 'adj_high_quality_reads', 119: 'short_pass', 101: 'short_pass', 135: 'cell_mask_bilayers_sum',
              292: 'adj_active_polymerase', 133: 'cell_mask_bilayers_sum', 99: 'short_pass',
              270: 'adj_active_polymerase', 19: 'Unnamed: 0', 382: 'adj_high_quality_reads',
              414: 'num_align_high_quality_reads', 267: 'adj_active_polymerase', 218: 'adj_single_pores',
              398: 'num_align_high_quality_reads', 287: 'adj_active_polymerase', 155: 'cell_mask_bilayers_sum',
              392: 'num_align_high_quality_reads', 265: 'adj_active_polymerase', 239: 'adj_single_pores',
              237: 'adj_single_pores'},
 'value': {361: 99704.0, 119: 2072785.0, 101: 2061059.0, 135: 1682208.0, 292: 675306.0, 133: 1714292.0,
           99: 2072785.0, 270: 687988.0, 19: 19.0, 382: np.nan, 414: 285176.0, 267: 86914.0, 218: 948971.0,
           398: 405196.0, 287: 137926.0, 155: 1830032.0, 392: 480081.0, 265: 951689.0, 239: 681452.0,
           237: 882671.0}}

Data:

          token   station cycle_number                      variable  \
19   180816_031     deneb      cycle04                    Unnamed: 0   
99   180816_031  callisto      cycle06                    short_pass   
101  180816_031  callisto      cycle04                    short_pass   
119  180816_031  callisto      cycle06                    short_pass   
133  180816_031     deneb      cycle05        cell_mask_bilayers_sum   
135  180816_031  callisto      cycle01        cell_mask_bilayers_sum   
155  180816_031     deneb      cycle08        cell_mask_bilayers_sum   
218  180816_031     deneb      cycle07              adj_single_pores   
237  180816_031  callisto      cycle07              adj_single_pores   
239  180816_031  callisto      cycle09              adj_single_pores   
265  180816_031  callisto      cycle02         adj_active_polymerase   
267  180816_031  callisto      cycle10         adj_active_polymerase   
270  180816_031  callisto      cycle07         adj_active_polymerase   
287  180816_031     deneb      cycle09         adj_active_polymerase   
292  180816_031  callisto      cycle04         adj_active_polymerase   
361  180816_031     deneb      cycle09        adj_high_quality_reads   
382  180816_031  callisto      cycle08        adj_high_quality_reads   
392  180816_031     deneb      cycle06  num_align_high_quality_reads   
398  180816_031  callisto      cycle08  num_align_high_quality_reads   
414  180816_031     deneb      cycle04  num_align_high_quality_reads   

         value  
19        19.0  
99   2072785.0  
101  2061059.0  
119  2072785.0  
133  1714292.0  
135  1682208.0  
155  1830032.0  
218   948971.0  
237   882671.0  
239   681452.0  
265   951689.0  
267    86914.0  
270   687988.0  
287   137926.0  
292   675306.0  
361    99704.0  
382        NaN  
392   480081.0  
398   405196.0  
414   285176.0  

I am trying to create scatterplot with smooth line

fig,ax = plt.subplots()
fig.set_size_inches(16,4)

#to get different colors for each of the `variable` value assign the variable to hue
g2=sns.lmplot(x='cycle_number',y='value',data=df, hue='variable',  size=4, aspect=5)

This code gives plots only one value for scatter plot, however my expected output is as below

Expected Output:

enter image description here

TRIES RESULT

TRY 1

I tried to create bar plot (with some help) & i was successful but with scatter plot I could not do it

Below code to convert same to bar

df1 = df.groupby(['token','variable']).agg({'value': 'mean'})
df1.reset_index(inplace=True)
df1.sort_values('value',inplace=True,ascending=False)
fig,ax = plt.subplots()
fig.set_size_inches(16,8)
#to get different colors for each of the variable assign the variable to hue
g=sns.barplot(x='token',y='value',data=df1, hue='variable',ax=ax)
#Code for to put legend outside the plot
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# Adding respective values to the top of each bar
for p in ax.patches: 
    ax.annotate("%d" % p.get_height(), (p.get_x() + p.get_width() / 2, p.get_height()),
                ha='center', va='center', fontsize=11, color='black', xytext=(0, 10), 
                textcoords='offset points',fontweight='bold')
plt.show()

TRY 2

g2=sns.lmplot(x='cycle_number',y='value',data=df), this gives error 
ValueError: could not convert string to float: 'cycle10'

I know what the error means here but I am feeling helpless trying to replicate to output code

TRY 3:

sns.lmplot('cycle_number', 'value', data=df, hue='variable', fit_reg=False)

OUTPUT GENERATED: Blank GRID

Upvotes: 0

Views: 14195

Answers (1)

Nihal
Nihal

Reputation: 5334

use:

sns.pointplot('cycle_number', 'value', data=df, hue='variable')

enter image description here

documnetation: https://seaborn.pydata.org/generated/seaborn.pointplot.html

OUTPUT GENERATE USING THIS VS EXPECTED OUTPUT

enter image description here

Expected OUTPUT

Try this:

df = pd.DataFrame(d)

df['cycle_number'] = df['cycle_number'].str.replace('cycle', '')
df['cycle_number'] = df['cycle_number'].apply(pd.to_numeric)
print(df)

fig, ax = plt.subplots()
fig.set_size_inches(16, 4)
# sns.pointplot('cycle_number', 'value', data=df, hue='variable', err_style="bars", ci=68)
sns.lmplot('cycle_number', 'value', data=df, hue='variable', ci=None, order=2, truncate=True)
# use order = 5 to see more curve

output for order=2

enter image description here

OUTPUT as per the latest code shared ( for order=2)

Final Output - The legend is overlapping with graph area

OUTPUT 4 ( for order=5)*: USING ORDER =5

The graph curve is perfectly fine except that the legend overlaps with plotting area.

Upvotes: 6

Related Questions