Reputation: 103
I have a dataframe called result
that has the columns date1
and date2
. All other columns are being created as you see.
What I want is to create three columns based on the information on the column date_diff
. One is called "less than 6 days" with 1 or 0 based on wether the element in date_diff
is between 0 and 6. The other columns follow the same logic with the names "7-21 days" and "22+ days".
result['date_diff'] = result['date2'] - result['date1']
result['date_diff'] = result['date_diff'].dt.days
pd.to_numeric(result['date_diff'])
def menos_6dias(result):
if 0 <= result['date_diff'] <= 6:
return 1
else:
return 0
result['Pending < 6 days'] = result.apply(menos_6dias, axis=1)
def de_7_a_21dias(teste):
if 7 <= result['date_diff'] <= 21:
return 1
else:
return 0
result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)
def mais_de_22dias(result):
if result['date_diff'] >= 22:
return 1
else:
return 0
result['22+ days'] = result.apply(mais_de_22dias, axis=1)
result.head()
There is an error I believe is due to the datatype of the column date_diff
. Thus, I tried using .dt.days
and pd.to_numeric
but that didn't work. The error is:
ValueError Traceback (most recent call last)
<ipython-input-34-78fa25211501> in <module>()
18 return 0
19
---> 20 result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)
21
22 def mais_de_22dias(result):
/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4358 f, axis,
4359 reduce=reduce,
-> 4360 ignore_failures=ignore_failures)
4361 else:
4362 return self._apply_broadcast(f, axis)
/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4454 try:
4455 for i, v in enumerate(series_gen):
-> 4456 results[i] = func(v)
4457 keys.append(v.name)
4458 except Exception as e:
<ipython-input-34-78fa25211501> in de_7_a_21dias(teste)
13
14 def de_7_a_21dias(teste):
---> 15 if 7 <= result['dias pendentes na acao'] <= 21:
16 return 1
17 else:
/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
951 raise ValueError("The truth value of a {0} is ambiguous. "
952 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953 .format(self.__class__.__name__))
954
955 __bool__ = __nonzero__
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
Here are the column headers of my dataframe: contract id || name || email || company || company state || user.active || contract.active || date1 || date2 || Pending || Answered || Rejected || Canceled || Inactive || Total Requests || fb rq id || aux
Upvotes: 1
Views: 594
Reputation: 37003
Consider the following DataFrame, hd
.
beer_servings
country
Armenia 21
Bulgaria 231
Cuba 93
France 127
Iran 0
Libya 0
Mozambique 47
Peru 163
Serbia 283
Thailand 99
Vanuatu 21
You probably know that a comparison with a Pandas column gives you a column of Booleans.
In [54]: pd.to_numeric(hd['beer_servings'] < 50)
Out[54]:
country
Armenia True
Bulgaria False
Cuba False
France False
Iran True
Libya True
Mozambique True
Peru False
Serbia False
Thailand False
Vanuatu True
Name: beer_servings, dtype: bool
You may not know that the Series has an astype
method that will let you convert the Boolean column to integer.
In [57]: (hd['beer_servings'] < 50).astype(int)
Out[57]:
country
Armenia 1
Bulgaria 0
Cuba 0
France 0
Iran 1
Libya 1
Mozambique 1
Peru 0
Serbia 0
Thailand 0
Vanuatu 1
Name: beer_servings, dtype: int64
I think you have demonstrated sufficient Pandas knowledge to take it from there, with the caveat that comparisons like 0 < df['column'] < 12
don't work, and have to be recast as (df['column'] > 0) & (df['column'] < 12)
or similar.
Upvotes: 1