Reputation: 99
I have a list which consists of a different colours, all stored as string variables.
Preferredcolours = ['red','yellow','green', 'blue']
I have a panda array, which contains information about cars. One of the column DfCar['colour'] consists of the colours of these cars. I want to create a new variable in my data frame, column named PreferredMathcing which =1 if the DataFrame colour column matches with one of the list colours. How can I use a for loop to solve this?
I would ideally want this sort of a solution:
+=================+============================+
| DfCar['colour'] | DfCar['PreferredMathcing'] |
+=================+============================+
| white | 0 |
+-----------------+----------------------------+
| yellow | 1 |
+-----------------+----------------------------+
| black | 0 |
+-----------------+----------------------------+
| purple | 0 |
+-----------------+----------------------------+
| green | 1 |
+-----------------+----------------------------+
Upvotes: 1
Views: 105
Reputation: 13393
you can use .isin(), which returns a Series with True
/False
for each row based on if it is in a list of values. then use .astype(int)
to get your 1
/0
instead.
try this:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']
df["PreferredMathcing"] = df['colour'].isin(Preferredcolours).astype(int)
print(df)
output:
colour PreferredMathcing
0 white 0
1 yellow 1
2 black 0
3 purple 0
4 green 1
NOTE:
choosing a solution with a pure library function will likely out-perform a solution using apply
with custom python logic.
bench-marking those against each other on my machine suggests .isin()
is almost x8 faster:
with '.isin()': 1.0591506958007812
with '.apply()': 8.234664678573608
ratio: 7.774780974248154
Upvotes: 1
Reputation: 563
You can use np.where like below:
import pandas as pd
import numpy as np
DfCar = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']
DfCar['PreferredMathcing'] = np.where(DfCar['colour'].isin(Preferredcolours), 1, 0)
Upvotes: 1
Reputation: 498
Assuming DfCar
is your Dataframe.
Preferredcolours = ['red','yellow','green', 'blue']
DfCar['PreferredMatching'] = DfCar['colour'].apply(lambda x: x in Preferredcolours)
This will apply the lambda function over every element in your "colour" column. Simply check if it is in "preferredcolours" and return True or False.
Upvotes: 0
Reputation: 348
following will give you output
def check_colour(x, Preferredcolours) :
return 1 if x['colour'] in Preferredcolours else 0
dfCar['PreferredMathcing'] = df.apply(check_colour,args=(Preferredcolours,), axis=1)
Upvotes: 1