elPastor
elPastor

Reputation: 8966

Split a pandas date list based on another pandas date list

I'm trying to split one date list by using another. So:

d = {'date':['1/15/2015','2/15/2015'], 'num':[1,2]}
s = {'split':['2/1/2015']}

df = pd.DataFrame(d)
sf = pd.DataFrame(s)

df['date'] = pd.to_datetime(df['date'])
sf['split'] = pd.to_datetime(sf['split'])

df['date'].split_by(sf['split'])

would yield:

        date    num
0 2015-01-15    1.0
1 2015-02-01    NaN  
2 2015-02-15    2.0

...but of course, it doesn't. I'm sure there's a simple merge or join I'm missing here, but I can't figure it out. Thanks.

Also, if the 'split' list has multiple dates, some of which fall outside the range of the 'date' list, I don't want them included. So basically, the extents of the new range would be the same as the old.

(side note: if there's a better way to convert a dictionary to a DataFrame and immediately convert the date strings to datetimes, that would be icing on the cake)

Upvotes: 1

Views: 73

Answers (1)

jezrael
jezrael

Reputation: 862691

I think you need boolean indexing for filter sf by min and max of column date in df first and then concat with sort_values, for align need rename column:

d = {'date':['1/15/2015','2/15/2015'], 'num':[1,2]}
s = {'split':['2/1/2015', '2/1/2016', '2/1/2014']}

df = pd.DataFrame(d)
sf = pd.DataFrame(s)

df['date'] = pd.to_datetime(df['date'])
sf['split'] = pd.to_datetime(sf['split'])
print (df)
        date  num
0 2015-01-15    1
1 2015-02-15    2

print (sf)
       split
0 2015-02-01
1 2016-02-01
2 2014-02-01
mask = (sf.split <= df.date.max()) & (sf.split >= df.date.min())
print (mask)
0     True
1    False
2    False
Name: split, dtype: bool

sf = sf[mask]
print (sf)
       split
0 2015-02-01

df = pd.concat([df, sf.rename(columns={'split':'date'})]).sort_values('date')
print (df)
        date  num
0 2015-01-15  1.0
0 2015-02-01  NaN
1 2015-02-15  2.0

Upvotes: 1

Related Questions