Reputation: 544
I have a pandas dataframe
with sorting on column 'DT'
like this:
S DT
100 2000-12-12 05:00:00
100 2000-12-12 05:00:50
89 2000-12-12 05:01:20
89 2000-12-12 05:02:00
89 2000-12-12 05:02:35
98 2000-12-12 05:03:15
98 2000-12-12 05:03:50
98 2000-12-12 05:04:28
98 2000-12-12 05:05:05
112 2000-12-12 05:05:47
112 2000-12-12 05:06:15
112 2000-12-12 05:07:00
How can I find the previous of any given value in column 'S'
from this data?
Like for S = 112, its previous should give 98, for 98 it should give prev = 89 and so on. I would like to store the previous values for any given 'S' in a separate variable which I can later access in my code. Any help will be deeply appreciated as I am new to the world of coding.
Upvotes: 4
Views: 213
Reputation: 862406
Idea is use Series.shift
, replace all rows without last of consecutive groups by Series.where
to missing values and last forward filling mising values.
Solution also working if multiple groups with same values, like changed 89
to 112
.
df['prev'] = df['S'].shift().where(df['S'].ne(df['S'].shift())).ffill()
print (df)
S DT prev
0 100 2000-12-12 05:00:00 NaN
1 100 2000-12-12 05:00:50 NaN
2 89 2000-12-12 05:01:20 100.0
3 89 2000-12-12 05:02:00 100.0
4 89 2000-12-12 05:02:35 100.0
5 98 2000-12-12 05:03:15 89.0
6 98 2000-12-12 05:03:50 89.0
7 98 2000-12-12 05:04:28 89.0
8 98 2000-12-12 05:05:05 89.0
9 112 2000-12-12 05:05:47 98.0
10 112 2000-12-12 05:06:15 98.0
11 112 2000-12-12 05:07:00 98.0
If need only previous values in new DataFrame
:
df1 = df.assign(prev=df['S'].shift()).loc[df['S'].ne(df['S'].shift()), ['S','prev']]
print (df1)
S prev
0 100 NaN
2 89 100.0
5 98 89.0
9 112 98.0
Upvotes: 2
Reputation: 3598
Another way using lambda
function:
def frst(x):
if len(x):
return x[0]
df['prev'] = df.apply(lambda x: frst(df[(df.index<x.name) & (df.S != x.S)].S.tail(1).values), axis = 1)
Result:
S DT prev
0 100 2000-12-12 05:00:00 NaN
1 100 2000-12-12 05:00:50 NaN
2 89 2000-12-12 05:01:20 100.0
3 89 2000-12-12 05:02:00 100.0
4 89 2000-12-12 05:02:35 100.0
5 98 2000-12-12 05:03:15 89.0
6 98 2000-12-12 05:03:50 89.0
7 98 2000-12-12 05:04:28 89.0
8 98 2000-12-12 05:05:05 89.0
9 112 2000-12-12 05:05:47 98.0
10 112 2000-12-12 05:06:15 98.0
11 112 2000-12-12 05:07:00 98.0
Above method is much slower and not "pythonic" as previous solutions, so consider it only for educational purposes :-)
Upvotes: 0
Reputation: 18647
Assuming the values of S
are unique, you could use Series.unique
and create a dict
with zip
and slicing. Use Series.map
if you need to assign values back into df
:
unique_S = df['S'].unique()
d = dict(zip(unique_S[1:], unique_S))
# print(d)
# {89: 100, 98: 89, 112: 98}
# If you need to assign back into DataFrame, use map
df['prev'] = df['S'].map(d)
If these are unique "contiguous groups" and not unique values, you could use cumsum
, groupby
and shift
to get the previous groups value:
groups = df['S'].ne(df['S'].shift()).cumsum()
prev_map = df['S'].groupby(groups).first().shift()
df['prev'] = groups.map(prev_map)
[out]
S DT prev
0 100 2000-12-12 05:00:00 NaN
1 100 2000-12-12 05:00:50 NaN
2 89 2000-12-12 05:01:20 100.0
3 89 2000-12-12 05:02:00 100.0
4 89 2000-12-12 05:02:35 100.0
5 98 2000-12-12 05:03:15 89.0
6 98 2000-12-12 05:03:50 89.0
7 98 2000-12-12 05:04:28 89.0
8 98 2000-12-12 05:05:05 89.0
9 112 2000-12-12 05:05:47 98.0
10 112 2000-12-12 05:06:15 98.0
11 112 2000-12-12 05:07:00 98.0
Upvotes: 2