Reputation: 1462
I have a DataFrame in a variable called "myDataFrame" that looks like this:
+---------+-----+-------+-----
| Type | Count | Status |
+---------+-----+-------+-----
| a | 70 | 0 |
| a | 70 | 0 |
| b | 70 | 0 |
| c | 74 | 3 |
| c | 74 | 2 |
| c | 74 | 0 |
+---------+-----+-------+----+
I am using vectorized approach to process the rows in this DataFrame since the amount of rows I have is about 116 million.
So I wrote something like this:
myDataFrame['result'] = processDataFrame(myDataFrame['status'], myDataFrame['Count'])
In my function, I am trying to do this:
def processDataFrame(status, count):
resultsList = list()
if status == 0:
resultsList.append(count + 10000)
else:
resultsList.append(count - 10000)
return resultsList
But I get this for comparison status values:
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
What am i missing?
Upvotes: 0
Views: 52
Reputation: 8790
I think your function is not really doing the vectorized part.
When it is called, you pass status = myDataFrame['status']
, so when it gets to the first if
, it checks the condition of myDataFrame['status'] == 0
. But myDataFrame['status'] == 0
is a boolean series (of whether each element of the status
column equals 0), so it doesn't have a single Truth value (hence the error). Similarly, if the condition could be met, the resultsList
would just get the whole "Count"
column appended, either all plus 10000 or all minus 10000.
Edit:
I suppose this function uses the built in pandas
functions, but applies them in your function:
def processDataFrame(status, count):
status_0 = (status == 0)
output = count.copy() #if you don't want to modify in place
output[status_0] += 10
output[~status_0] -= 10
return output
Upvotes: 0
Reputation: 323236
We can do without self-def function
myDataFrame['result'] = np.where(myDataFrame['status']==0,
myDataFrame['Count']+10000,
myDataFrame['Count']-10000)
Update
df.apply(lambda x : processDataFrame(x['Status'],x['Count']),1)
0 [10070]
1 [10070]
2 [10070]
3 [-9926]
4 [-9926]
5 [10074]
dtype: object
Upvotes: 5