Prachi
Prachi

Reputation: 544

How to find previous of each unique value in a column based upon datetime column values in a dataframe?

I have a pandas dataframe with sorting on column 'DT' like this:

 S             DT             
100    2000-12-12 05:00:00
100    2000-12-12 05:00:50     
89     2000-12-12 05:01:20     
89     2000-12-12 05:02:00     
89     2000-12-12 05:02:35     
98     2000-12-12 05:03:15     
98     2000-12-12 05:03:50     
98     2000-12-12 05:04:28     
98     2000-12-12 05:05:05     
112    2000-12-12 05:05:47
112    2000-12-12 05:06:15
112    2000-12-12 05:07:00

How can I find the previous of any given value in column 'S' from this data?

Like for S = 112, its previous should give 98, for 98 it should give prev = 89 and so on. I would like to store the previous values for any given 'S' in a separate variable which I can later access in my code. Any help will be deeply appreciated as I am new to the world of coding.

Upvotes: 4

Views: 213

Answers (3)

jezrael
jezrael

Reputation: 862406

Idea is use Series.shift, replace all rows without last of consecutive groups by Series.where to missing values and last forward filling mising values.

Solution also working if multiple groups with same values, like changed 89 to 112.

df['prev'] = df['S'].shift().where(df['S'].ne(df['S'].shift())).ffill()
print (df)
      S                   DT   prev
0   100  2000-12-12 05:00:00    NaN
1   100  2000-12-12 05:00:50    NaN
2    89  2000-12-12 05:01:20  100.0
3    89  2000-12-12 05:02:00  100.0
4    89  2000-12-12 05:02:35  100.0
5    98  2000-12-12 05:03:15   89.0
6    98  2000-12-12 05:03:50   89.0
7    98  2000-12-12 05:04:28   89.0
8    98  2000-12-12 05:05:05   89.0
9   112  2000-12-12 05:05:47   98.0
10  112  2000-12-12 05:06:15   98.0
11  112  2000-12-12 05:07:00   98.0

If need only previous values in new DataFrame:

df1 = df.assign(prev=df['S'].shift()).loc[df['S'].ne(df['S'].shift()), ['S','prev']]
print (df1)
     S   prev
0  100    NaN
2   89  100.0
5   98   89.0
9  112   98.0

Upvotes: 2

ipj
ipj

Reputation: 3598

Another way using lambda function:

def frst(x):
    if len(x):
        return x[0]

df['prev'] = df.apply(lambda x: frst(df[(df.index<x.name) & (df.S != x.S)].S.tail(1).values), axis = 1)

Result:

      S                   DT   prev
0   100  2000-12-12 05:00:00    NaN
1   100  2000-12-12 05:00:50    NaN
2    89  2000-12-12 05:01:20  100.0
3    89  2000-12-12 05:02:00  100.0
4    89  2000-12-12 05:02:35  100.0
5    98  2000-12-12 05:03:15   89.0
6    98  2000-12-12 05:03:50   89.0
7    98  2000-12-12 05:04:28   89.0
8    98  2000-12-12 05:05:05   89.0
9   112  2000-12-12 05:05:47   98.0
10  112  2000-12-12 05:06:15   98.0
11  112  2000-12-12 05:07:00   98.0

Above method is much slower and not "pythonic" as previous solutions, so consider it only for educational purposes :-)

Upvotes: 0

Chris Adams
Chris Adams

Reputation: 18647

Assuming the values of S are unique, you could use Series.unique and create a dict with zip and slicing. Use Series.map if you need to assign values back into df:

unique_S = df['S'].unique()
d = dict(zip(unique_S[1:], unique_S))

# print(d)
# {89: 100, 98: 89, 112: 98}

# If you need to assign back into DataFrame, use map
df['prev'] = df['S'].map(d)

If these are unique "contiguous groups" and not unique values, you could use cumsum, groupby and shift to get the previous groups value:

groups = df['S'].ne(df['S'].shift()).cumsum()
prev_map = df['S'].groupby(groups).first().shift()

df['prev'] = groups.map(prev_map)

[out]

      S                   DT   prev
0   100  2000-12-12 05:00:00    NaN
1   100  2000-12-12 05:00:50    NaN
2    89  2000-12-12 05:01:20  100.0
3    89  2000-12-12 05:02:00  100.0
4    89  2000-12-12 05:02:35  100.0
5    98  2000-12-12 05:03:15   89.0
6    98  2000-12-12 05:03:50   89.0
7    98  2000-12-12 05:04:28   89.0
8    98  2000-12-12 05:05:05   89.0
9   112  2000-12-12 05:05:47   98.0
10  112  2000-12-12 05:06:15   98.0
11  112  2000-12-12 05:07:00   98.0

Upvotes: 2

Related Questions