HungryMolecule
HungryMolecule

Reputation: 337

Calculate gap between two datasets (pandas, matplotlib, fill_between already used)

I'd like to ask for suggestions how to calculate lenght of gap between two datasets in matplotlib made of pandas dataframe. Ideally, I would like to have these gap values written in the plot and also, if it is possible, include them into the dataframe. Here is my simplified example of dataframe:

import pandas as pd
d = {'Mean-1': [0.195842, 0.295069, 0.321345, 0.773725], 'SEM-1': [0.001216, 0.002687, 0.005267, 0.029974], 'Mean-2': [0.143103, 0.250505, 0.305767, 0.960804],'SEM-2': [0.000959, 0.001368, 0.003722, 0.150025], 'Atom Number': [1, 3, 5, 7]}
df=pd.DataFrame(d)
df

    Mean-1      SEM-1       Mean-2      SEM-2     Atom Number
0   0.195842    0.001216    0.143103    0.000959    1
1   0.295069    0.002687    0.250505    0.001368    3
2   0.321345    0.005267    0.305767    0.003722    5
3   0.773725    0.029974    0.960804    0.150025    7

Then I made plot, where we can see two lines representing Mean-1 and Mean-2, and then shaded area around each line representing standard error of the mean. This is done for the selected atom numbers.

import matplotlib.pyplot as plt

ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'])

y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']

error_1 = df['SEM-1']
error_2 = df['SEM-1']

ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
plt.xticks(x)

enter image description here

What I would like to do further is to calculate the gap for each residue. The gap is the white space only, thus space where the lines as well as the shaded areas (SEMs) don't overlap. And also would like to know if I can somehow print the gap values from the plot? And save them into column. Thank You for suggestions.

Upvotes: 0

Views: 101

Answers (2)

Scott Boston
Scott Boston

Reputation: 153510

IIUC, do you want something like this:

import matplotlib.pyplot as plt

ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'], figsize=(15,8))

y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']

error_1 = df['SEM-1']
error_2 = df['SEM-1']

ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
ax.fill_between(df['Atom Number'], y_1+error_1, y_2-error_2, alpha=.2, edgecolor='k', facecolor='blue')

for i in range(len(x)):
    gap = y_1[i]+error_1[i] - y_2[i]-error_2[i]
    ylabel = min(y_1[i], y_2[i]) + abs(gap) / 2
    _ = ax.annotate(f'{gap:0.4f}', xy=(x[i],ylabel), xytext=(x[i]-.14,y_1[i]+gap/abs(gap)*.2), arrowprops=dict(arrowstyle="-"))
plt.xticks(x);

Output:

enter image description here

Upvotes: 1

It's not a compact solution but you could try something like this (Check the order of things). Calculate all the position (y_i and upper and lower limits).

import numpy as np
df['y1_upper'] = y_1+error_1
df['y1_lower'] = y_1-error_1
df['y2_upper'] = y_2+error_2
df['y2_lower'] = y_2-error_2

which gives

    Mean-1     SEM-1    Mean-2     SEM-2  Atom Number  y1_upper  y1_lower  \
0  0.195842  0.001216  0.143103  0.000959            1  0.197058  0.194626   
1  0.295069  0.002687  0.250505  0.001368            3  0.297756  0.292382   
2  0.321345  0.005267  0.305767  0.003722            5  0.326612  0.316078   
3  0.773725  0.029974  0.960804  0.150025            7  0.803699  0.743751   

   y2_upper  y2_lower     
0  0.144319  0.141887  
1  0.253192  0.247818  
2  0.311034  0.300500  
3  0.990778  0.930830  

The distances (gaps) are calculated differently depending on if y_1 is over y_2and vice versa. So use conditions on the upper and lower limits and use linalg.norm to compute the distance.

conditions = [
    (df['y1_lower'] >= df['y2_upper']),
    (df['y1_lower'] < df['y2_upper'])]
choices = [np.linalg.norm(df['y1_lower']-df['y2_upper']), np.linalg.norm(df['y2_lower']-df['y1_upper'])]
df['dist'] = np.select(conditions, choices)

This gives

    Mean-1     SEM-1    Mean-2     SEM-2  Atom Number  y1_upper  y1_lower  \
0  0.195842  0.001216  0.143103  0.000959            1  0.197058  0.194626   
1  0.295069  0.002687  0.250505  0.001368            3  0.297756  0.292382   
2  0.321345  0.005267  0.305767  0.003722            5  0.326612  0.316078   
3  0.773725  0.029974  0.960804  0.150025            7  0.803699  0.743751   

   y2_upper  y2_lower      dist  
0  0.144319  0.141887  0.255175  
1  0.253192  0.247818  0.255175  
2  0.311034  0.300500  0.255175  
3  0.990778  0.930830  0.149605  

As I said, check the order, but this is a possible solution.

Upvotes: 1

Related Questions