Reputation: 2149
I am trying to get the min/max dates from columns that might not have a date.
I have tried things like '', "NaN", True, False, 0, etc...
I first set the 'p_date' and 's_date' from the actual date based on the action of '+' or '-'.
df_td [ 'p_date' ] = np.where ( df_td.action == '+', df_td.date, ??? )
df_td [ 's_date' ] = np.where ( df_td.action == '-', df_td.date, ??? )
So I need the min of the p_date & max of the s_date here.
issue p_date s_date
0 issue 2012-11-01
1 issue 2013-12-09
2 issue 2014-12-08
3 issue 2016-01-13
4 issue 2012-11-01
5 issue 2014-03-26
6 issue 2015-05-29
7 issue 2013-12-18
8 issue 2016-01-13
I do a group by
g = df_td.groupby ( [ 'name', 'type' ], as_index = False ).agg (
{ ...
'p_date': 'min',
's_date': 'max'
} )
This gives me the following error.
'<=' not supported between instances of 'datetime.date' and 'str'
if I make up a date for the blank space it does work but this is not a great option.
What can I fill the date with to get it to work?
Thanks.
Upvotes: 1
Views: 7752
Reputation: 353589
In pandas, NaN
is used as the missing value, and is ignored for most operations, so it's the right one to use. If you're still getting an error, it's probably because you've got a datetime.date there (well, you've definitely got that there, I mean that it's probably causing the problems).
For example, if your missing values are ""
and your column dtypes are object
with internal types of datetime.date
, I get:
In [496]: df.groupby("issue").agg({"p_date": "min", "s_date": "max"})
[...]
TypeError: '<=' not supported between instances of 'datetime.date' and 'str'
but if I switch to pandas-native time objects and NaNs, it works:
In [500]: df["p_date"] = pd.to_datetime(df["p_date"])
In [501]: df["s_date"] = pd.to_datetime(df["s_date"])
In [502]: df
Out[502]:
issue p_date s_date
0 issue 2012-11-01 NaT
1 issue 2013-12-09 NaT
2 issue 2014-12-08 NaT
3 issue NaT 2016-01-13
4 issue 2012-11-01 NaT
5 issue NaT 2014-03-26
6 issue NaT 2015-05-29
7 issue 2013-12-18 NaT
8 issue NaT 2016-01-13
In [503]: df.groupby("issue").agg({"p_date": "min", "s_date": "max"})
Out[503]:
p_date s_date
issue
issue 2012-11-01 2016-01-13
Upvotes: 2
Reputation: 2149
I am not happy with my answer but it seems to work.
I created a floor and ceiling dates.
floor = datetime.date ( 1900, 1, 1 )
ceil = datetime.date ( 2100, 1, 1 )
df_td [ 'p_date' ] = np.where ( df_td.action == '+', df_td.date, ceil )
df_td [ 's_date' ] = np.where ( df_td.action == '-', df_td.date, floor )
That way they will offset the min/max aggregate function that need a data object to work.
still would like a more appropriate suggestion. Thanks.
Upvotes: 0