Reputation: 367
I have a csv file that I am turning into a pandas dataframe. One of the columns a
is mostly filled with numbers and occasionally filled with zeros. I would like to make a new column e
that is the number in column a+500
, EXCEPT if there is a 0
in that row for the a
column. In those cases it should just stay 0
. Or I guess it would also work if column e
was a+500
, and then all of the cases in e
that were just 500
were turned into zero. Any help would be great, I'm new to using pandas/python.
Upvotes: 2
Views: 283
Reputation: 85422
>>> df = pd.DataFrame({'a': [100, 0, 200], 'b': [200, 500, 0]})
>>> df
a b
0 100 200
1 0 500
2 200 0
You can use where for fast generation of your column:
>>> df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
>>> df
a b e
0 100 200 600
1 0 500 0
2 200 0 700
For a data frame with three million rows:
n = int(1e6)
df = pd.DataFrame({'a': [100, 0, 200] * n, 'b': [200, 500, 0] * n})
using apply (as suggested in another answer here) is pretty slow:
%timeit df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
1 loops, best of 3: 2.5 s per loop
compared to using where()
:
%timeit df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
10 loops, best of 3: 90.9 ms per loop
It is about 28 times faster.
Upvotes: 1
Reputation: 2564
Try this :
df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
Upvotes: 1
Reputation: 2414
I'd propose to write a function and use pd.apply
like that:
import pandas as pd
df = pd.DataFrame({'a': [0, 1]})
def add500ifnot0(c):
if c == 0:
return c
else:
return c + 500
df['e'] = df['a'].apply(add500ifnot0)
df
Upvotes: 0