user5826447
user5826447

Reputation: 367

adding to a column in a df in pandas python

I have a csv file that I am turning into a pandas dataframe. One of the columns a is mostly filled with numbers and occasionally filled with zeros. I would like to make a new column e that is the number in column a+500, EXCEPT if there is a 0 in that row for the a column. In those cases it should just stay 0. Or I guess it would also work if column e was a+500, and then all of the cases in e that were just 500 were turned into zero. Any help would be great, I'm new to using pandas/python.

Upvotes: 2

Views: 283

Answers (3)

Mike Müller
Mike Müller

Reputation: 85422

Example data frame

>>> df = pd.DataFrame({'a': [100, 0, 200], 'b': [200, 500, 0]})
>>> df
     a    b
0  100  200
1    0  500
2  200    0

Solution

You can use where for fast generation of your column:

>>> df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
>>> df
     a    b    e
0  100  200  600
1    0  500    0
2  200    0  700

Performance

For a data frame with three million rows:

n = int(1e6)
df = pd.DataFrame({'a': [100, 0, 200] * n, 'b': [200, 500, 0] * n})

using apply (as suggested in another answer here) is pretty slow:

%timeit df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
1 loops, best of 3: 2.5 s per loop

compared to using where():

%timeit df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
10 loops, best of 3: 90.9 ms per loop

It is about 28 times faster.

Upvotes: 1

DavidK
DavidK

Reputation: 2564

Try this :

df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)

Upvotes: 1

Fabian Rost
Fabian Rost

Reputation: 2414

I'd propose to write a function and use pd.apply like that:

import pandas as pd
df = pd.DataFrame({'a': [0, 1]})
def add500ifnot0(c):
    if c == 0:
        return c
    else:
        return c + 500
df['e'] = df['a'].apply(add500ifnot0)
df

Upvotes: 0

Related Questions