Advait Vartak
Advait Vartak

Reputation: 15

Add a column to a df where if a certain value is 0, return 1 else return the original value of the column

the Python code with which I am trying to achieve this result is:

df['column2'] = np.where(df['column1'] == 0, 1, df['column1'])

Upvotes: 0

Views: 3825

Answers (4)

Trenton McKinney
Trenton McKinney

Reputation: 62463

  • For the sample dataframe it is fastest to use np.where.
  • You can also use pandas.DataFrame.where, which will replace values where the condition is False otherwise return the value in the dataframe column.
  • 100 is used to make the update easier to see
import pandas as pd

# test dataframe
df = pd.DataFrame({'a': [2, 4, 1, 0, 2, 2, 0, 8, 4, 0], 'b': [2, 4, 0, 9, 2, 0, 2, 8, 0, 3]})

# replace 0 with 100 or leave the same number based on the same column
df['0 → 100 on a if a'] = df.a.where(df.a != 0, 100)

# replace 0 with 100 or leave the same number based on a different column
df['0 → 100 on a if b'] = df.a.where(df.b != 0, 100)

# display(df)
   a  b  0 → 100 on a if a  0 → 100 on a if b
0  2  2                  2                  2
1  4  4                  4                  4
2  1  0                  1                100
3  0  9                100                  0
4  2  2                  2                  2
5  2  0                  2                100
6  0  2                100                  0
7  8  8                  8                  8
8  4  0                  4                100
9  0  3                100                  0

%%timeit testing

Test Data

import pandas as pd
import numpy as np

# test dataframe with 1M rows
np.random.seed(365)
df = pd.DataFrame({'a': np.random.randint(0, 10, size=(1000000)), 'b': np.random.randint(0, 10, size=(1000000))})

Tests

%%timeit
np.where(df.a == 0, 1, df.a)
[out]:
161 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
np.where(df.b == 0, 1, df.a)
[out]:
164 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
df.a.where(df.a != 0, 1)
[out]:
4.51 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.a.where(df.b != 0, 1)
[out]:
4.55 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
noah1(df)
[out]:
4.63 ms ± 58.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
noah2(df)
[out]:
15.3 s ± 205 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
paul(df)
[out]:
341 ms ± 34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
karam(df)
[out]:
299 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Functions

def noah1(d):
    return d.a.replace(0, 1)

def noah2(d):
    return d.apply(lambda x: 1 if x.a == 0 else x.b, axis=1)

def paul(d):
    return [1 if v==0 else v for v in d.a.values]

def karam(d):
    return d.a.apply(lambda x: 1 if x == 0 else x)

Upvotes: 4

noah
noah

Reputation: 2786

What you want is essentially to just copy the column and replace 0s with 1s:

df["Column2"] = df["Column1"].replace(0,1)

More generally if you wanted the value in some other ColumnX you can do the following lamda function:

df["Column2"] = df.apply(lambda x: 1 if x["Column1"]==0 else x['ColumnX'], axis=1)

Upvotes: 1

Paul Wilson
Paul Wilson

Reputation: 560

The apply example provided above should work or this works too:

df['column_2'] = [1 if v==0 else v for v in df['col'].values]

My example uses list comprehension: https://www.w3schools.com/python/python_lists_comprehension.asp

And the other answer uses lambda function: https://www.w3schools.com/python/python_lambda.asp

Personally, when writing scripts that others may use I think list comprehension is more widely known and therefore more verbose, but I believe lambda function performs faster and in general is a highly useful tool so probably recommended above list comprehension.

Upvotes: 2

Karan Shishoo
Karan Shishoo

Reputation: 2802

You should be able to achieve that using an apply statement in this manner:

df['column2'] = df['column1'].apply(lambda x: 1 if x == 0 else x)

Upvotes: 0

Related Questions