Reputation: 99

Looping over a string list in Python

I have a list which consists of a different colours, all stored as string variables.

Preferredcolours = ['red','yellow','green', 'blue']

I have a panda array, which contains information about cars. One of the column DfCar['colour'] consists of the colours of these cars.  I want to create a new variable in my data frame, column named PreferredMathcing which =1 if the DataFrame colour column matches with one of the list colours. How can I use a for loop to solve this?

I would ideally want this sort of a solution:

+=================+============================+
| DfCar['colour'] | DfCar['PreferredMathcing'] |
+=================+============================+
| white           |                          0 |
+-----------------+----------------------------+
| yellow          |                          1 |
+-----------------+----------------------------+
| black           |                          0 |
+-----------------+----------------------------+
| purple          |                          0 |
+-----------------+----------------------------+
| green           |                          1 |
+-----------------+----------------------------+

Upvotes: 1

Answers (4)

Adam.Er8

Reputation: 13393

you can use .isin(), which returns a Series with True/False for each row based on if it is in a list of values. then use .astype(int) to get your 1/0 instead.

try this:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

df["PreferredMathcing"] = df['colour'].isin(Preferredcolours).astype(int)

print(df)

output:

   colour  PreferredMathcing
0   white                  0
1  yellow                  1
2   black                  0
3  purple                  0
4   green                  1

NOTE:

choosing a solution with a pure library function will likely out-perform a solution using apply with custom python logic.

bench-marking those against each other on my machine suggests .isin() is almost x8 faster:

with '.isin()': 1.0591506958007812
with '.apply()': 8.234664678573608
ratio: 7.774780974248154

Upvotes: 1

Wytamma Wirth

Reputation: 563

You can use np.where like below:

import pandas as pd
import numpy as np

DfCar = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

DfCar['PreferredMathcing'] = np.where(DfCar['colour'].isin(Preferredcolours), 1, 0)

Upvotes: 1

dustin-we

Reputation: 498

Assuming DfCar is your Dataframe.

Preferredcolours = ['red','yellow','green', 'blue']    
DfCar['PreferredMatching'] = DfCar['colour'].apply(lambda x: x in Preferredcolours)

This will apply the lambda function over every element in your "colour" column. Simply check if it is in "preferredcolours" and return True or False.

Upvotes: 0

Suresh Mali

Reputation: 348

following will give you output

def check_colour(x, Preferredcolours) :
    return 1 if x['colour'] in Preferredcolours else 0

dfCar['PreferredMathcing'] = df.apply(check_colour,args=(Preferredcolours,), axis=1)

Upvotes: 1

Looping over a string list in Python

Answers (4)

Related Questions