Reputation: 2789
I have a following dataframe with the following columns
>>print(df.columns)
Index(['iteration0', 'iteration1', 'iteration2', 'iteration3', 'iteration4',
'iteration5', 'iteration6', 'iteration7', 'iteration8', 'iteration9',
'iteration10', 'iteration11', 'iteration12', 'iteration13',
'iteration14', 'iteration15', 'iteration16', 'iteration17',
'iteration18', 'iteration19', 'iteration20', 'iteration21',
'iteration22', 'iteration23', 'iteration24', 'iteration25',
'iteration26', 'iteration27', 'iteration28', 'iteration29',
'iteration30', 'iteration31', 'iteration32', 'iteration33',
'iteration34', 'iteration35', 'iteration36', 'iteration37',
'iteration38', 'iteration39', 'iteration40', 'iteration41',
'iteration42', 'iteration43', 'iteration44', 'iteration45',
'iteration46', 'iteration47', 'iteration48', 'iteration49',
'iteration50', 'iteration51', 'iteration52', 'iteration53',
'iteration54', 'iteration55', 'iteration56', 'iteration57',
'iteration58', 'iteration59', 'iteration60', 'iteration61',
'iteration62', 'iteration63', 'iteration64', 'iteration65',
'iteration66', 'iteration67', 'iteration68', 'iteration69',
'iteration70', 'iteration71', 'iteration72', 'iteration73',
'iteration74', 'iteration75', 'iteration76', 'iteration77',
'iteration78', 'iteration79', 'iteration80', 'iteration81',
'iteration82', 'iteration83', 'iteration84', 'iteration85',
'iteration86', 'iteration87', 'iteration88', 'iteration89',
'iteration90', 'iteration91', 'iteration92', 'iteration93',
'iteration94', 'iteration95', 'iteration96', 'iteration97',
'iteration98', 'iteration99'],
dtype='object')
I also have an index for each line of the Dataframe, which is the date
print(df.index)
Index(['05/12/2009', '05/13/2009', '05/14/2009', '05/15/2009', '05/18/2009',
'05/19/2009', '05/20/2009', '05/21/2009', '05/22/2009', '05/25/2009',
...
'10/23/2009', '10/26/2009', '10/27/2009', '10/28/2009', '10/29/2009',
'10/30/2009', '11/02/2009', '11/03/2009', '11/04/2009', '11/05/2009'],
dtype='object', name='Date', length=127)
Therefore, I have a dataFrame with 127 lines and 100 columns. Each value in this dataset assumes 0, 1 or 2.
What I want to do is simply getting the mode of each line, getting the most frequent value of each Date. Here is what I did:
most_frequent=df.mode(axis=1)
Then, I will return a new dataframe, containing the mode of each line
local_df['ensemble'] = most_frequent
But when I run the code, here is my error:
File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3370, in __setitem__
self._set_item(key, value)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3446, in _set_item
NDFrame._set_item(self, key, value)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3172, in _set_item
self._data.set(key, value)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1056, in set
self.insert(len(self.items), item, value)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1158, in insert
placement=slice(loc, loc + 1))
File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 3095, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 87, in __init__
'{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1
By printing the most_frequent dataFrame, I have the very weird behavior
09/25/2009 0.0 NaN
09/28/2009 0.0 NaN
09/29/2009 0.0 NaN
09/30/2009 1.0 NaN
10/01/2009 0.0 NaN
10/02/2009 0.0 NaN
10/05/2009 0.0 NaN
10/06/2009 1.0 NaN
10/07/2009 0.0 NaN
10/08/2009 0.0 NaN
10/09/2009 0.0 NaN
10/12/2009 0.0 NaN
10/13/2009 1.0 NaN
10/14/2009 0.0 NaN
10/15/2009 0.0 NaN
10/16/2009 0.0 NaN
10/19/2009 0.0 NaN
10/20/2009 0.0 NaN
10/21/2009 0.0 NaN
10/22/2009 0.0 NaN
10/23/2009 0.0 NaN
10/26/2009 0.0 NaN
10/27/2009 0.0 NaN
In other words, there is a new column as result.
I dont know if its what caused the problem. Anyway, what was my mistake here?
Upvotes: 1
Views: 56
Reputation: 862511
There is no mistake, mode method return sometimes more like 1 value, here per row.
So try select first column by position with DataFrame.iloc
:
local_df['ensemble'] = df.mode(axis=1).iloc[:, 0]
Upvotes: 2