Define variable number of columns in for loop

Question

I am new to pandas and I am creating new columns based on conditions from other existing columns using the following code:

df.loc[(df.item1_existing=='NO') & (df.item1_sold=='YES'),'unit_item1']=1
df.loc[(df.item2_existing=='NO') & (df.item2_sold=='YES'),'unit_item2']=1
df.loc[(df.item3_existing=='NO') & (df.item3_sold=='YES'),'unit_item3']=1

Basically, what this means is that if item is NOT existing ('NO') and the item IS sold ('YES') then give me a 1. This works to create 3 new columns but I am thinking there is a better way. As you can see, there is a repeated string in the name of the columns: '_existing' and '_sold'. I am trying to create a for loop that will look for the name of the column that ends with that specific word and concatenate the beginning, something like this:

unit_cols = ['item1','item2','item3']
for i in unit_cols:
    df.loc[('df.'+i+'_existing'=='NO') & ('df'+i+'_sold'=='YES'),'unit_'+i]=1

but of course, it doesn't work. As I said, I am able to make it work with the initial example, but I would like to have fewer lines of code instead of repeating the same code because I need to create several columns this way, not just three. Is there a way to make this easier? is the for loop the best option? Thank you.

jpp · Accepted Answer

You can use Boolean series, i.e. True / False depending on whether your condition is met. Coupled with pd.Series.eq and f-strings (PEP498, Python 3.6+), and using __getitem__ (or its syntactic sugar []) to allow string inputs, you can write your logic more readably:

unit_cols = ['item1','item2','item3']
for i in unit_cols:
    df[f'unit_{i}'] = df[f'{i}_existing'].eq('NO') & df[f'{i}_sold'].eq('YES')

If you need integers (1 / 0) instead of Boolean values, you can convert via astype:

    df[f'unit_{i}'] = df[f'unit_{i}'].astype(int)

Define variable number of columns in for loop

Answers (1)

Related Questions