Reputation: 15
the Python code with which I am trying to achieve this result is:
df['column2'] = np.where(df['column1'] == 0, 1, df['column1'])
Upvotes: 0
Views: 3825
Reputation: 62463
dataframe
it is fastest to use np.where
.pandas.DataFrame.where
, which will replace values where the condition is False
otherwise return the value in the dataframe column.100
is used to make the update easier to seeimport pandas as pd
# test dataframe
df = pd.DataFrame({'a': [2, 4, 1, 0, 2, 2, 0, 8, 4, 0], 'b': [2, 4, 0, 9, 2, 0, 2, 8, 0, 3]})
# replace 0 with 100 or leave the same number based on the same column
df['0 → 100 on a if a'] = df.a.where(df.a != 0, 100)
# replace 0 with 100 or leave the same number based on a different column
df['0 → 100 on a if b'] = df.a.where(df.b != 0, 100)
# display(df)
a b 0 → 100 on a if a 0 → 100 on a if b
0 2 2 2 2
1 4 4 4 4
2 1 0 1 100
3 0 9 100 0
4 2 2 2 2
5 2 0 2 100
6 0 2 100 0
7 8 8 8 8
8 4 0 4 100
9 0 3 100 0
%%timeit
testingimport pandas as pd
import numpy as np
# test dataframe with 1M rows
np.random.seed(365)
df = pd.DataFrame({'a': np.random.randint(0, 10, size=(1000000)), 'b': np.random.randint(0, 10, size=(1000000))})
%%timeit
np.where(df.a == 0, 1, df.a)
[out]:
161 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
np.where(df.b == 0, 1, df.a)
[out]:
164 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
df.a.where(df.a != 0, 1)
[out]:
4.51 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.a.where(df.b != 0, 1)
[out]:
4.55 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
noah1(df)
[out]:
4.63 ms ± 58.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
noah2(df)
[out]:
15.3 s ± 205 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
paul(df)
[out]:
341 ms ± 34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
karam(df)
[out]:
299 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def noah1(d):
return d.a.replace(0, 1)
def noah2(d):
return d.apply(lambda x: 1 if x.a == 0 else x.b, axis=1)
def paul(d):
return [1 if v==0 else v for v in d.a.values]
def karam(d):
return d.a.apply(lambda x: 1 if x == 0 else x)
Upvotes: 4
Reputation: 2786
What you want is essentially to just copy the column and replace 0s with 1s:
df["Column2"] = df["Column1"].replace(0,1)
More generally if you wanted the value in some other ColumnX
you can do the following lamda function:
df["Column2"] = df.apply(lambda x: 1 if x["Column1"]==0 else x['ColumnX'], axis=1)
Upvotes: 1
Reputation: 560
The apply example provided above should work or this works too:
df['column_2'] = [1 if v==0 else v for v in df['col'].values]
My example uses list comprehension
: https://www.w3schools.com/python/python_lists_comprehension.asp
And the other answer uses lambda function
: https://www.w3schools.com/python/python_lambda.asp
Personally, when writing scripts that others may use I think list comprehension is more widely known and therefore more verbose, but I believe lambda function performs faster and in general is a highly useful tool so probably recommended above list comprehension.
Upvotes: 2
Reputation: 2802
You should be able to achieve that using an apply statement in this manner:
df['column2'] = df['column1'].apply(lambda x: 1 if x == 0 else x)
Upvotes: 0