Erick
Erick

Reputation: 103

Python error I can't resolve: value of a series is ambiguous

I have a dataframe called result that has the columns date1 and date2. All other columns are being created as you see.

What I want is to create three columns based on the information on the column date_diff. One is called "less than 6 days" with 1 or 0 based on wether the element in date_diff is between 0 and 6. The other columns follow the same logic with the names "7-21 days" and "22+ days".

result['date_diff'] = result['date2'] - result['date1']
result['date_diff'] = result['date_diff'].dt.days
pd.to_numeric(result['date_diff'])


def menos_6dias(result):
    if 0 <= result['date_diff'] <= 6:
        return 1
    else:
        return 0

result['Pending < 6 days'] = result.apply(menos_6dias, axis=1)

def de_7_a_21dias(teste):
    if 7 <= result['date_diff'] <= 21:
        return 1
    else:
        return 0

result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)

def mais_de_22dias(result):
    if result['date_diff'] >= 22:
        return 1
    else:
        return 0

result['22+ days'] = result.apply(mais_de_22dias, axis=1)

result.head()

There is an error I believe is due to the datatype of the column date_diff. Thus, I tried using .dt.days and pd.to_numeric but that didn't work. The error is:

ValueError                                Traceback (most recent call last)
<ipython-input-34-78fa25211501> in <module>()
     18         return 0
     19 
---> 20 result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)
     21 
     22 def mais_de_22dias(result):

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4358                         f, axis,
   4359                         reduce=reduce,
-> 4360                         ignore_failures=ignore_failures)
   4361             else:
   4362                 return self._apply_broadcast(f, axis)

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4454             try:
   4455                 for i, v in enumerate(series_gen):
-> 4456                     results[i] = func(v)
   4457                     keys.append(v.name)
   4458             except Exception as e:

<ipython-input-34-78fa25211501> in de_7_a_21dias(teste)
     13 
     14 def de_7_a_21dias(teste):
---> 15     if 7 <= result['dias pendentes na acao'] <= 21:
     16         return 1
     17     else:

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
    951         raise ValueError("The truth value of a {0} is ambiguous. "
    952                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953                          .format(self.__class__.__name__))
    954 
    955     __bool__ = __nonzero__

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

Here are the column headers of my dataframe: contract id || name || email || company || company state || user.active || contract.active || date1 || date2 || Pending || Answered || Rejected || Canceled || Inactive || Total Requests || fb rq id || aux

Upvotes: 1

Views: 594

Answers (1)

holdenweb
holdenweb

Reputation: 37003

Consider the following DataFrame, hd.

            beer_servings
country
Armenia                21
Bulgaria              231
Cuba                   93
France                127
Iran                    0
Libya                   0
Mozambique             47
Peru                  163
Serbia                283
Thailand               99
Vanuatu                21

You probably know that a comparison with a Pandas column gives you a column of Booleans.

In [54]: pd.to_numeric(hd['beer_servings'] < 50)
Out[54]:
country
Armenia        True
Bulgaria      False
Cuba          False
France        False
Iran           True
Libya          True
Mozambique     True
Peru          False
Serbia        False
Thailand      False
Vanuatu        True
Name: beer_servings, dtype: bool

You may not know that the Series has an astype method that will let you convert the Boolean column to integer.

In [57]: (hd['beer_servings'] < 50).astype(int)
Out[57]:
country
Armenia       1
Bulgaria      0
Cuba          0
France        0
Iran          1
Libya         1
Mozambique    1
Peru          0
Serbia        0
Thailand      0
Vanuatu       1
Name: beer_servings, dtype: int64

I think you have demonstrated sufficient Pandas knowledge to take it from there, with the caveat that comparisons like 0 < df['column'] < 12 don't work, and have to be recast as (df['column'] > 0) & (df['column'] < 12) or similar.

Upvotes: 1

Related Questions