user8034918
user8034918

Reputation: 439

How to get the value of one column based on another column value

I have a data set containing 4 columns. I would like to get last entry of time column per col1 and col3, for col1 = 1 the last entry is 2018-05-01 23:34:00 and for col1 = 2 the last entry is 2018-05-01 05:48:00 see if the difference between last entry and second to last entry (for example for col1 = 1 is 2018-04-30 01:58:00) is greater than 24 hours or not. That's also the interest for col2=2.

If so, I want to get the value in col2 corresponding to last and second to last entry and change the value in col2, corresponding to the second to last entry of time, to null.

Here is part of data:

       time           col1     col2     col3
 2018-04-29 02:12:00    1       50        3
 2018-04-30 01:58:00    1       40        3
 2018-05-01 23:34:00    1       11        3
 2018-04-29 02:02:00    2       10        4
 2018-05-01 05:48:00    2       25        4

The output that I would like to have is:

 col1     col2         col3
  1       [11, Null]     3
  2       [25, Null]     4

How can I filter out the value in col2, and make it like the above given the 4 columns that I have initially?

Upvotes: 0

Views: 134

Answers (1)

Karn Kumar
Karn Kumar

Reputation: 8816

This may help ..

>>> df
                  time  col1  col2  col3
0  2018-04-29 02:12:00     1    50     3
1  2018-04-30 01:58:00     1    40     3
2  2018-05-01 23:34:00     1    11     3
3  2018-05-01 05:48:00     2    10     4
4  2018-04-29 02:02:00     2    25     4

We can get the last occurrence of each col1 grouping as below..

>>> df.groupby(['col1'], as_index=False).last()
   col1                time  col2  col3
0     1 2018-05-01 23:34:00    11     3
1     2 2018-05-01 05:48:00    25     4

Now we can store that into a variable as last_diff for later comparison.

>>> last_diff = df.groupby(['col1'], as_index=False).last()

Now we can calculate the time difference using pd.DateOffset looking for more then 24 hours to get the boolean exp based on comparison it will return True or False ..

>>> last_diff['time_diff'] = (last_diff['time'] + pd.DateOffset(days=0)).dt.hour > 24

>>> last_diff
   col1                time  col2  col3  time_diff
0     1 2018-05-01 23:34:00    11     3      False
1     2 2018-05-01 05:48:00    25     4      False

Upvotes: 1

Related Questions