Reputation: 1398
In the following python
code I am trying to add three new columns to a Pandas dataframe
by processing data from other columns.
import pandas as pd
class MyClass:
def __init__(self):
pass
def foo( self, a, b, c, d ):
print 'A: ', a,' B: ', b, ' C: ', c, ' D: ', d
return 0.0, 0.0
if __name__=="__main__":
myClass = MyClass()
def bar(row):
b = row['B']
c = row['C']
d = row['D']
a = row['A']
e, f = myClass.foo( a, b, c, d )
return e, f, e + f
df = pd.DataFrame({
'A': [1522083365352316, 1522089025972228, 1522091257321565, 1522253707450381, 1522267174827558, 1522342541329606],
'B': [ 'X', 'X', 'Y', 'Y', 'X', 'X' ],
'C': [ 100, 100, 150, 50, 100, 57 ],
'D': [ 26.11, 26.1, 26.2, 26.2, 26.06, 26.09 ]
})
df['A'] = pd.to_datetime(df['A'], unit = 'us')
#print df
df['E'], df['F'], df['G'] = zip(*df.apply(bar, axis = 1))
print df
On running this I get the following error.
Traceback (most recent call last):
File "LifoPnLDebug.py", line 32, in <module>
df['E'], df['F'], df['G'] = zip(*df.apply(bar, axis = 1))
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals.py", line 4608, in construction_error
passed, implied))
ValueError: Shape of passed values is (6, 3), indices imply (6, 4)
What am I doing wrong?
Upvotes: 0
Views: 702
Reputation: 9081
Modify the bar
function like this -
def bar(row):
b = row['B']
c = row['C']
d = row['D']
a = row['A']
e, f = myClass.foo( a, b, c, d )
row['E'] = e
row['F'] = f
row['G'] = e+f
return row
And then call it like this -
df = df.apply(bar, axis = 1)
Output
A B C D E F G
0 2018-03-26 16:56:05.352316 X 100 26.11 0.0 0.0 0.0
1 2018-03-26 18:30:25.972228 X 100 26.10 0.0 0.0 0.0
2 2018-03-26 19:07:37.321565 Y 150 26.20 0.0 0.0 0.0
3 2018-03-28 16:15:07.450381 Y 50 26.20 0.0 0.0 0.0
4 2018-03-28 19:59:34.827558 X 100 26.06 0.0 0.0 0.0
5 2018-03-29 16:55:41.329606 X 57 26.09 0.0 0.0 0.0
Upvotes: 1
Reputation: 11
If you want to add just the columns coming from that function, you can directly apply df['E'], df['F'], df['G'] = bar(df)
And you will get the output like below:
A B C D E F G
0 2018-03-26 16:56:05.352316 X 100 26.11 0.0 0.0 0.0
1 2018-03-26 18:30:25.972228 X 100 26.10 0.0 0.0 0.0
2 2018-03-26 19:07:37.321565 Y 150 26.20 0.0 0.0 0.0
3 2018-03-28 16:15:07.450381 Y 50 26.20 0.0 0.0 0.0
4 2018-03-28 19:59:34.827558 X 100 26.06 0.0 0.0 0.0
5 2018-03-29 16:55:41.329606 X 57 26.09 0.0 0.0 0.0
You don't have use apply
function to do so...
see if this what you are looking for....
Upvotes: 0