146 percent Russian
146 percent Russian

Reputation: 2096

Merge DataFrames on condition

I have 2 DataFrames

df1 consists of columns: person_id,day,flag

df2 consists of columns: person_id,day_start,day_end

I want to add to df2 column num_flags which will show sum of flag column within person_id and day_start < day < day_end interval.

Which is the fastest way to do this without complicated loops? I'm seeking for fast solution like merge

Upvotes: 1

Views: 412

Answers (1)

Hng
Hng

Reputation: 148

>>> df = pd.merge(df1,df2, on="person_id", how="outer")
>>> df["lies_between"] = df.day.between(df.day_start, df.day_end,inclusive=False)
>>> x = pd.pivot_table(df,values="flags",columns="lies_between", index="person_id",aggfunc=np.sum)
>>> x.reset_index(drop=False,inplace=True)
>>> x[["person_id", True]]

This can be helpful:

>>> help(pandas.Series.between_time)

Upvotes: 1

Related Questions