Gerges
Gerges

Reputation: 6499

Get the value of another column when current column was last non-zero (Pandas dataframe)

I have a dataframe similar to this one containing an enumerated date, and ID, and number of items sold for given date and ID:

Date  ID  num_sold
0     1   0
      2   13
1     1   6
      2   0
2     1   0
      2   0
3     1   5
      2   4

For each ID, I want to compute the Date when last sale happened. So I have the following dataframe:

Date  ID  num_sold  last_sale
0     1   0         -1
      2   13        -1
1     1   6         -1
      2   0         0
2     1   0         1
      2   0         0
3     1   5         1
      2   4         0

I want to compute the column last_sale, where 1 and 0 are the date values when given an ID, num_sold was last non-zero. If there is no such last date (e.g. first month in dataset), it is -1 (or could be nan).

In the given example,

At date 1, ID 1 was not sold ever before so we set last_sale to -1. ID 2 was last sold in month 0, so we set last_sale to 0.

Similarly, at date 3, ID 1 was last sold in month 1 so last_sale is 1, and ID 2 was last sold in month 0, so last_sale is 0.

What's the most efficient and concise way to accomplish this?

Upvotes: 2

Views: 274

Answers (2)

Charles R
Charles R

Reputation: 1661

First you create a column for the current sale's date :

df["current_sale_date"] = 0
df.loc[df.num_sold != 0, "current_sale_date"] = df.Date

You create an other column for the last sales's date. First you replace 0 into np.nan in order to allow .ffill() to work. then for each ID, you get the value of the current date of sale thanks to .fill().

df.current_sale_date = df.current_sale_date.replace(0, np.nan)    
df["last_sale_date"] = df.groupby(['ID'])['current_sale_date'].ffill()

Then you get the previous date of sale with shift by ID

 df.last_sale_date = df.groupby(['ID'])['last_sale_date'].shift()

For the first date, you set -1

df.loc[df.Date == df.Date.idxmin(), "last_sale_date"] = -1

You replace back nan into 0 because 0 is a date

df.last_sale_date = df.last_sale_date.replace(np.nan, 0)

For the first time an ID is sold, if there is no previous date of sale, you set -1

 df.loc[(df.current_sale_date == 1) & (df.last_sale_date == 0), "last_sale_date"] = -1

Upvotes: 3

Sergei
Sergei

Reputation: 470

First, Let's create your DataFrame df1.

import pandas as pd

df1 = pd.DataFrame({'Date': ['0', '0', '1', '1', '2', '2', '3', '3'], 
                    'ID': ['1', '2', '1', '2', '1', '2', '1', '2'],
                    'num_sold': [0, 13, 6, 0, 0, 0, 5, 4]})
print(df1)

Output:

  Date ID  num_sold
0    0  1         0
1    0  2        13
2    1  1         6
3    1  2         0
4    2  1         0
5    2  2         0
6    3  1         5
7    3  2         4

Now we'll group it by ID, apply your logic to every ID group and concatenate these groups.

groups = []
for name, group in df1.groupby(['ID']):
    group.loc[group['num_sold'].shift(1)!=0, 'last_sale'] = group['Date'].shift(1)
    group['last_sale'] = group['last_sale'].fillna(method='ffill').fillna('-1')
    groups.append(group)

df2 = pd.concat(groups).sort_values(['Date', 'ID'])
print(df2)

Your result:

  Date ID  num_sold last_sale
0    0  1         0        -1
1    0  2        13        -1
2    1  1         6        -1
3    1  2         0         0
4    2  1         0         1
5    2  2         0         0
6    3  1         5         1
7    3  2         4         0

Good luck!

Upvotes: 1

Related Questions