Reputation: 1485
I am trying to add all values from pandas DataFrames df1.Number
to df2.Number
where the date ranges match.
I have a created df1
which looks as such:
Start End Number
0 2000-01-01 2000-01-08 1
1 2000-01-02 2000-01-09 2
2 2000-01-03 2000-01-10 3
3 2000-01-04 2000-01-11 4
4 2000-01-05 2000-01-12 5
5 2000-01-06 2000-01-13 6
6 2000-01-07 2000-01-14 7
and df2 which looks as such:
Dates Number
0 2000-01-01
1 2000-01-02
2 2000-01-03
3 2000-01-04
4 2000-01-05
5 2000-01-06
6 2000-01-07
7 2000-01-08
8 2000-01-09
9 2000-01-10
10 2000-01-11
11 2000-01-12
12 2000-01-13
13 2000-01-14
and I am trying to assign Number
from df1
to Number
in df2
if the date range in the rows match
The code I have used is for the first row is:
i = pd.date_range(df1.Start[0], df1.End[0])
if df2.Dates in i:
df2.Number.assign(df1.Number)
which does not throw an error but also does not assign anything to Number
. Expected output should be as follows:
Dates Number
0 2000-01-01 1
1 2000-01-02 1,2
2 2000-01-03 1,2,3
3 2000-01-04 1,2,3,4
4 2000-01-05 1,2,3,4,5
5 2000-01-06 1,2,3,4,5,6
6 2000-01-07 1,2,3,4,5,6,7
7 2000-01-08 1,2,3,4,5,6,7
8 2000-01-09 2,3,4,5,6,7
9 2000-01-10 3,4,5,6,7
10 2000-01-11 4,5,6,7
11 2000-01-12 5,6,7
12 2000-01-13 6,7
13 2000-01-14 7
Can anyone suggest why nothing is being assigned or a code that would do this for all rows? Thanks.
Upvotes: 1
Views: 1110
Reputation: 88226
Here's a way to do it:
s = df2.apply(lambda x: [((df1.loc[i,'Start'] < x) & (df1.loc[i,'End'] > x))
for i in range(len(df1))], axis = 1)
df2.assign(**s.apply(lambda x: np.unique(np.cumsum(x)) + 1).to_frame(name='Number'))
Dates Number
0 2000-01-01 [1]
1 2000-01-02 [2]
2 2000-01-03 [2, 3]
3 2000-01-04 [2, 3, 4]
4 2000-01-05 [2, 3, 4, 5]
5 2000-01-06 [2, 3, 4, 5, 6]
6 2000-01-07 [2, 3, 4, 5, 6, 7]
7 2000-01-08 [1, 2, 3, 4, 5, 6, 7]
8 2000-01-09 [1, 2, 3, 4, 5, 6]
9 2000-01-10 [1, 2, 3, 4, 5]
10 2000-01-11 [1, 2, 3, 4]
11 2000-01-12 [1, 2, 3]
12 2000-01-13 [1, 2]
13 2000-01-14 [1]
Note
Make sure the dates are in datetime format:
df1.Start = pd.to_datetime(df1.Start)
df1.End = pd.to_datetime(df1.End)
df2.Dates = pd.to_datetime(df2.Dates)
Upvotes: 2
Reputation: 323226
Using numpy
broadcast with dot
s1=df1.Start.values
s2=df1.End.values
s=df2.Dates.values[:,None]
pd.Series(((s1<=s)&(s2>=s)).dot(df1.Number.astype(str)+','),index=df2.index).str[:-1]
#df2['Number']=pd.Series(((s1<=s)&(s2>=s)).dot(df1.Number.astype(str)+','),index=df2.index).str[:-1]
Out[229]:
0 1
1 1,2
2 1,2,3
3 1,2,3,4
4 1,2,3,4,5
5 1,2,3,4,5,6
6 1,2,3,4,5,6,7
7 1,2,3,4,5,6,7
8 2,3,4,5,6,7
9 3,4,5,6,7
10 4,5,6,7
11 5,6,7
12 6,7
13 7
dtype: object
Upvotes: 4