Reputation: 13
A and B are non numeric columns. A and B columns dont have NaN Values.However, dataframe has NaN values in other columns.
I got a related link on github issues : https://github.com/pandas-dev/pandas/issues/32077 but I am not sure if this is relevant but I think upgrade is cauing the issue.
trepos = prdf.groupby(['A','B']).agg('max').reset_index()[['A', 'B']].apply(lambda x: f'{x.A}/{x.B}', axis=1).values
I want to migrate the code from older pandas version to 1.1.5 version of pandas.
The above code works fine in 0.22.0 version of pandas. However, its breaking in pandas version 1.1.5. Following is the error:
/tmp/ipykernel_283/1981918777.py in <module>
1 # release tags
----> 2 trepos = prdf.groupby(['A','B']).agg('max').reset_index()[['A', 'B']]#.apply(lambda x: f'{x.A}/{x.B}', axis=1).values
/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
949 As usual, the aggregation can be a callable or a string alias.
950
--> 951 See :ref:`groupby.aggregate.named` for more.
952
953 .. versionchanged:: 1.3.0
/opt/conda/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
305 # We need this defined here for mypy
306 raise AbstractMethodError(self)
--> 307
308 @property
309 def ndim(self) -> int:
/opt/conda/lib/python3.7/site-packages/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs)
261 """
262
--> 263 # ndarray compatibility
264 __array_priority__ = 1000
265 _hidden_attrs: frozenset[str] = frozenset(
/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in max(self, numeric_only, min_count)
1558 @final
1559 @Substitution(name="groupby")
-> 1560 @Appender(_common_see_also)
1561 def any(self, skipna: bool = True):
1562 """
/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in _agg_general(self, numeric_only, min_count, alias, npfunc)
999 # Dispatch/Wrapping
1000
-> 1001 @final
1002 def _concat_objects(self, keys, values, not_indexed_same: bool = False):
1003 from pandas.core.reshape.concat import concat
/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
1020
1021 if isinstance(sobj, Series):
-> 1022 # GH#35246 test_groupby_as_index_select_column_sum_empty_df
1023 result.columns = self._obj_with_exclusions.columns.copy()
1024 else:
/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
1122
1123 def _aggregate_item_by_item(self, func, *args, **kwargs) -> DataFrame:
-> 1124 # only for axis==0
1125 # tests that get here with non-unique cols:
1126 # test_resample_with_timedelta_yields_no_empty_groups,
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in make_block(self, values, placement)
252 if placement is None:
253 placement = self._mgr_locs
--> 254 if self.is_extension:
255 values = ensure_block_shape(values, ndim=self.ndim)
256
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
129 """
130 If we have a multi-column block, split and operate block-wise. Otherwise
--> 131 use the original method.
132 """
133
ValueError: Wrong number of items passed 4, placement implies 5```
For Example:
the below code works fine in 0.22.0:
```import numpy as np
import pandas as pd
df_simple_max = pd.DataFrame({'key': ['a','a','b','b','c','c'], 'data' : ['e','e','f','f','g','g'],
'good_string' : ['cat','dog','cat','dog','fish','pig'],
'bad_string' : ['cat',np.nan,np.nan, np.nan, np.nan, np.nan]})
df_simple_max.groupby(['key','data']).agg('max').reset_index()[['key', 'data']].apply(lambda x: f'{x.key}/{x.data}', axis=1).values```
And the output is :
array(['a/<memory at 0x7fb181255108>', 'b/<memory at 0x7fb181255108>',
'c/<memory at 0x7fb181255108>'], dtype=object)
but breaks on 1.1.5 pandas version
Upvotes: 0
Views: 286
Reputation: 13
Pandas Version 1.1.5 has a bug while doing aggregation for max on groupbydataframes. This was fixed in 1.3.1. Running the above code works fine in 1.3.1 version of pandas. Hence closing the ticket.
Upvotes: 1