Liu Yu
Liu Yu

Reputation: 391

How to resample/reorganize dataframe

This is my dataframe.

 df1=pd.DataFrame({'user':['A','A','A','A','B','B','B'], 'status': 
 [1,0,1,0,1,0,1],'bindate':['2019-1-20','','2019-1-27','','2019-1- 
  2','','2019-1-25'],'unbindate':['','2019-1-25','','2019-1-30','','2019- 
 1-20','']})

It looks like this

  user   stutas  bindate      unbindate
0  A       1     2019-01-20   Nat
1  A       0     Nat          2019-01-25
2  A       1     2019-01-27   Nat
3  A       0     Nat          2019-01-30
4  B       1     2019-01-02   Nat
5  B       0     Nat          2019-01-20
6  B       1     2019-01-25   Nat

I want to use Pandas to come up with a new dataframe that looks like below

  user     bindate      unbindate
0  A       2019-01-20   2019-01-25
1  A       2019-01-27   2019-01-30
2  B       2019-01-02   2019-01-20
3  B       2019-01-25   None   

Our server captures the date when user becomes the member as bindate, and captures the date when user exists the member as unbindate. If it is a member, status=1, else status=0

I want to mash the records to create a date range to show the user as a member. Notice user A binds and unbinds twice, so A is not the member in the end. User B binds twice, unbinds once, so B is the member in the end, hence I leave the unbindate as None. I tried to use .loc to grab the data, no luck. Is there a better way to achieve this? Thank you.

Upvotes: 1

Views: 59

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148870

If your dataframe is exactly as you have shown with interleaved bind and unbind, shift could be enough:

df1['unbindate'] = df1.unbindate.shift(-1)
df1 = df1.loc[df1.status != 0].reset_index(drop=True).drop(columns='status')

It gives as expected:

  user    bindate  unbindate
0    A  2019-1-20  2019-1-25
1    A  2019-1-27  2019-1-30
2    B  2019-1-02  2019-1-20
3    B  2019-1-25        NaN

Upvotes: 0

BENY
BENY

Reputation: 323226

Not quit efficient but work groupby sorted the value by isnull

df=df.mask(df=='Nat')
df.groupby('user').apply(lambda x : x.apply(lambda y : sorted(y,key=pd.isnull))).\
    dropna(subset=['bindate','unbindate'],thresh=1)
Out[64]: 
  user  stutas     bindate   unbindate
0    A       1  2019-01-20  2019-01-25
1    A       0  2019-01-27  2019-01-30
4    B       1  2019-01-02  2019-01-20
5    B       0  2019-01-25         NaN

Upvotes: 1

Related Questions