Reputation: 197
I am new to pandas and I am creating new columns based on conditions from other existing columns using the following code:
df.loc[(df.item1_existing=='NO') & (df.item1_sold=='YES'),'unit_item1']=1
df.loc[(df.item2_existing=='NO') & (df.item2_sold=='YES'),'unit_item2']=1
df.loc[(df.item3_existing=='NO') & (df.item3_sold=='YES'),'unit_item3']=1
Basically, what this means is that if item is NOT existing ('NO') and the item IS sold ('YES') then give me a 1. This works to create 3 new columns but I am thinking there is a better way. As you can see, there is a repeated string in the name of the columns: '_existing' and '_sold'. I am trying to create a for loop that will look for the name of the column that ends with that specific word and concatenate the beginning, something like this:
unit_cols = ['item1','item2','item3']
for i in unit_cols:
df.loc[('df.'+i+'_existing'=='NO') & ('df'+i+'_sold'=='YES'),'unit_'+i]=1
but of course, it doesn't work. As I said, I am able to make it work with the initial example, but I would like to have fewer lines of code instead of repeating the same code because I need to create several columns this way, not just three. Is there a way to make this easier? is the for loop the best option? Thank you.
Upvotes: 1
Views: 229
Reputation: 164623
You can use Boolean series, i.e. True
/ False
depending on whether your condition is met. Coupled with pd.Series.eq
and f-strings (PEP498, Python 3.6+), and using __getitem__
(or its syntactic sugar []
) to allow string inputs, you can write your logic more readably:
unit_cols = ['item1','item2','item3']
for i in unit_cols:
df[f'unit_{i}'] = df[f'{i}_existing'].eq('NO') & df[f'{i}_sold'].eq('YES')
If you need integers (1
/ 0
) instead of Boolean values, you can convert via astype
:
df[f'unit_{i}'] = df[f'unit_{i}'].astype(int)
Upvotes: 2