Dougie Fresh
Dougie Fresh

Reputation: 27

How can I make this python code more efficient?

I realize this is an incredibly inefficient way to code this, so I'm hoping someone will have suggestions on a more efficient method.

Essentially I'm trying to create a column ("freq") with values of 0 for NA and "Nothing" objects and 1 otherwise. Sample df:

i   obj           freq

0.  Nothing        0
1.  Something      1
2.  NaN            0
3.  Something      1


for i in range(0,len(df)):
  if str(df["obj"].iloc[i]) == "Nothing" or str(df["obj"].iloc[i]) == NaN:
    d["freq"].iloc[i] = 0
  else:
    df["freq"].iloc[i] = 1

Upvotes: 0

Views: 88

Answers (3)

hpchavaz
hpchavaz

Reputation: 1388

In this case, it is not even necessary to use numpy:

df['freq'] = (~(df.obj.isnull() | (df.obj == 'Nothing'))) * 1

Note:

Is it useful to code with '0' and '1'? Can't we stay with the result of the boolean operation keeping the 'False' and True' values? If it is the case the answer would simply be:

df['freq'] = ~(df.obj.isnull() | (df.obj == 'Nothing'))

Upvotes: 0

Matthew Borish
Matthew Borish

Reputation: 3096

You can use np.where()

import pandas as pd 
import numpy as np

df = pd.DataFrame({'obj': {0: 'Nothing', 1: 'Something', 2: np.nan, 3: 'Something'}})

df['freq'] = np.where((df['obj'] == 'Nothing') | (df['obj'].isnull()), 0, 1)

Upvotes: 2

valcarcexyz
valcarcexyz

Reputation: 622

Without a dataframe is hard to check if works, but it should

indexer = (df['obj'] == 'Nothing') | (df['obj'].astype(str) == 'NaN')
df.loc[indexer, 'freq'] = 0
df.loc[~indexer, 'freq'] = 1

Upvotes: 0

Related Questions