Reputation: 8966
I'm trying to split one date list by using another. So:
d = {'date':['1/15/2015','2/15/2015'], 'num':[1,2]}
s = {'split':['2/1/2015']}
df = pd.DataFrame(d)
sf = pd.DataFrame(s)
df['date'] = pd.to_datetime(df['date'])
sf['split'] = pd.to_datetime(sf['split'])
df['date'].split_by(sf['split'])
would yield:
date num
0 2015-01-15 1.0
1 2015-02-01 NaN
2 2015-02-15 2.0
...but of course, it doesn't. I'm sure there's a simple merge or join I'm missing here, but I can't figure it out. Thanks.
Also, if the 'split'
list has multiple dates, some of which fall outside the range of the 'date'
list, I don't want them included. So basically, the extents of the new range would be the same as the old.
(side note: if there's a better way to convert a dictionary to a DataFrame and immediately convert the date strings to datetimes, that would be icing on the cake)
Upvotes: 1
Views: 73
Reputation: 862691
I think you need boolean indexing
for filter sf
by min
and max
of column date
in df
first and then concat
with sort_values
, for align need rename column:
d = {'date':['1/15/2015','2/15/2015'], 'num':[1,2]}
s = {'split':['2/1/2015', '2/1/2016', '2/1/2014']}
df = pd.DataFrame(d)
sf = pd.DataFrame(s)
df['date'] = pd.to_datetime(df['date'])
sf['split'] = pd.to_datetime(sf['split'])
print (df)
date num
0 2015-01-15 1
1 2015-02-15 2
print (sf)
split
0 2015-02-01
1 2016-02-01
2 2014-02-01
mask = (sf.split <= df.date.max()) & (sf.split >= df.date.min())
print (mask)
0 True
1 False
2 False
Name: split, dtype: bool
sf = sf[mask]
print (sf)
split
0 2015-02-01
df = pd.concat([df, sf.rename(columns={'split':'date'})]).sort_values('date')
print (df)
date num
0 2015-01-15 1.0
0 2015-02-01 NaN
1 2015-02-15 2.0
Upvotes: 1