Reza energy
Reza energy

Reputation: 135

Pandas: splitting data frame based on the slope of data

I have this data frame

x = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})

enter image description here

Update: I want a function If the slope is negetive and the length of the group is more than 2 then it should return True, index of start and end of the group. for this case it should return: result=True, index=5, index=8

1- I want to split the data frame based on the slope. This example should have 6 groups.

2- how can I check the length of groups?

enter image description here

I tried to get groups by the below code but I don't know how can split the data frame and how can check the length of each part

New update: Thanks Matt W. for his code. finally I found the solution.

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().fillna(0)
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
def get_slope(df):
    x=np.array(df.iloc[:,0].index)
    y=np.array(df.iloc[:,0])
    X = x - x.mean()
    Y = y - y.mean()
    slope = (X.dot(Y)) / (X.dot(X))
    return slope
df['g'] = init[1:]

df.groupby('g').apply(get_slope)

Result

0    NaN
1    NaN
2    NaN
3    0.0
4    NaN
5   -1.5
6    NaN

Upvotes: 0

Views: 465

Answers (2)

Andrew Schonfeld
Andrew Schonfeld

Reputation: 110

Just wanted to present another solution that doesn't require a for-loop:

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[diff < 0, 'diff'] = -1
df['g'] = (~(df['diff'] == df['diff'].shift(1))).cumsum()
df

Upvotes: 1

Matt W.
Matt W.

Reputation: 3722

Take the difference and bfill() the start so that you have the same number in the 0th element. Then turn all negatives the same so we can imitate them being the same "slope". Then I shift it to check to see if the next number is the same and iterate through giving us a list of when it changes, assigning that to g.

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
df['g'] = init[1:]
df
   entity  diff  g
0       5   2.0  1
1       7   2.0  1
2       5  -1.0  2
3       5   0.0  3
4       5   0.0  3
5       6   1.0  4
6       3  -1.0  5
7       2  -1.0  5
8       0  -1.0  5
9       5   5.0  6

Upvotes: 2

Related Questions