Reputation: 1592
I have pandas dataframe that looks similar to this (date is index):
>>> J01B_X J01B_y J02C_x J02C_y...
date
2019-06-23 0.45 1.12 4.56 1.1
2019-06-24 0.22 1.18 5.5 0.8
2019-06-25 0.35 1.10 6.1 8.3
...
I want to calculate the slope based on the X and Y values that are in the columns:
(0.45 1.12, 0.22 1,18, 0.35 1.10) -> slope for observation J01B based on J01B_X and J01B_y
(4.51 1.1 , 5.5 0.8 , 6.1 8.3) -> calc slope for observation J02C based on J02C_X and J02C_y
the thing is that I have 58 columns like this to calculate their slope based on two columns each time.
In the end I would like to have one row,not in the same original table, with the calculation of the slope based on the two columns, something like this (this is fake numbes):
>>> J01B J02C ....
0.13 0.05
Is there any way to do something like this?
Upvotes: 0
Views: 2637
Reputation: 1075
The example creates a pandas Series which is basically a single dimensional pandas object like a row. You can create a dataframe from that if you wish
import pandas as pd
from scipy import stats
slopeB = stats.linregress(df['J01B_X'], df['J01B_y'] )
slopeB = slopeB[0]
slopeC = stats.linregress(df['J02C_x'], df['J02C_y'] )
slopeC = slopeC[0]
#Create Pandas series with slope data
slopes = pd.Series([slopeB, slopeC], index = ['J01B', 'J02C'], name="Slope")
slopedf = pd.DataFrame(slopes).T
slopes looks like this:
J01B -0.278195
J02C 4.233791
Name: Slope, dtype: float64
slopedf looks like this and is a DataFrame with one row:
J01B J02C
Slope -0.278195 4.233791
Both slopes and slopedf can be queries the same way, but the series will return the numerical value of the entry and the slopedf will return a single element series with the data. Even though the Series appears as a column when printed I think this is what you want.
#output of slopes['J01B']
-0.2781954887218037
#output of slopedf['J01B']
Slope -0.278195
Name: J01B, dtype: float64
Upvotes: 3