Vivek Shukla
Vivek Shukla

Reputation: 13

Pandas version upgrade causing value error while using groupby and aggregate max

A and B are non numeric columns. A and B columns dont have NaN Values.However, dataframe has NaN values in other columns.

I got a related link on github issues : https://github.com/pandas-dev/pandas/issues/32077 but I am not sure if this is relevant but I think upgrade is cauing the issue.

trepos = prdf.groupby(['A','B']).agg('max').reset_index()[['A', 'B']].apply(lambda x: f'{x.A}/{x.B}', axis=1).values

I want to migrate the code from older pandas version to 1.1.5 version of pandas.

The above code works fine in 0.22.0 version of pandas. However, its breaking in pandas version 1.1.5. Following is the error:

/tmp/ipykernel_283/1981918777.py in <module>
      1 # release tags
----> 2 trepos = prdf.groupby(['A','B']).agg('max').reset_index()[['A', 'B']]#.apply(lambda x: f'{x.A}/{x.B}', axis=1).values

/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    949       As usual, the aggregation can be a callable or a string alias.
    950 
--> 951     See :ref:`groupby.aggregate.named` for more.
    952 
    953     .. versionchanged:: 1.3.0

/opt/conda/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    305         # We need this defined here for mypy
    306         raise AbstractMethodError(self)
--> 307 
    308     @property
    309     def ndim(self) -> int:

/opt/conda/lib/python3.7/site-packages/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs)
    261     """
    262 
--> 263     # ndarray compatibility
    264     __array_priority__ = 1000
    265     _hidden_attrs: frozenset[str] = frozenset(

/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in max(self, numeric_only, min_count)
   1558     @final
   1559     @Substitution(name="groupby")
-> 1560     @Appender(_common_see_also)
   1561     def any(self, skipna: bool = True):
   1562         """

/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in _agg_general(self, numeric_only, min_count, alias, npfunc)
    999     # Dispatch/Wrapping
   1000 
-> 1001     @final
   1002     def _concat_objects(self, keys, values, not_indexed_same: bool = False):
   1003         from pandas.core.reshape.concat import concat

/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
   1020 
   1021                     if isinstance(sobj, Series):
-> 1022                         # GH#35246 test_groupby_as_index_select_column_sum_empty_df
   1023                         result.columns = self._obj_with_exclusions.columns.copy()
   1024                     else:

/opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   1122 
   1123     def _aggregate_item_by_item(self, func, *args, **kwargs) -> DataFrame:
-> 1124         # only for axis==0
   1125         # tests that get here with non-unique cols:
   1126         #  test_resample_with_timedelta_yields_no_empty_groups,

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in make_block(self, values, placement)
    252         if placement is None:
    253             placement = self._mgr_locs
--> 254         if self.is_extension:
    255             values = ensure_block_shape(values, ndim=self.ndim)
    256 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
    129     """
    130     If we have a multi-column block, split and operate block-wise.  Otherwise
--> 131     use the original method.
    132     """
    133 

ValueError: Wrong number of items passed 4, placement implies 5```

For Example:
the below code works fine in 0.22.0:
```import numpy as np
import pandas as pd
df_simple_max = pd.DataFrame({'key': ['a','a','b','b','c','c'], 'data' : ['e','e','f','f','g','g'],
                          'good_string' : ['cat','dog','cat','dog','fish','pig'],
                          'bad_string' : ['cat',np.nan,np.nan, np.nan, np.nan, np.nan]})
df_simple_max.groupby(['key','data']).agg('max').reset_index()[['key', 'data']].apply(lambda x: f'{x.key}/{x.data}', axis=1).values```
And the output is :
array(['a/<memory at 0x7fb181255108>', 'b/<memory at 0x7fb181255108>',
       'c/<memory at 0x7fb181255108>'], dtype=object)

but breaks on 1.1.5 pandas version

Upvotes: 0

Views: 286

Answers (1)

Vivek Shukla
Vivek Shukla

Reputation: 13

Pandas Version 1.1.5 has a bug while doing aggregation for max on groupbydataframes. This was fixed in 1.3.1. Running the above code works fine in 1.3.1 version of pandas. Hence closing the ticket.

Upvotes: 1

Related Questions