Reputation: 549
I have a list of strings, let's say:
fruit_list = ["apple", "banana", "coconut"]
And I have some Pandas Dataframe, such like:
import pandas as pd
data = [['Apple farm', 10], ['Banana field', 15], ['Coconut beach', 14], ['corn field', 10]]
df = pd.DataFrame(data, columns = ['fruit_source', 'value'])
And I want to populate a new column based on a text search of the existing column 'fruit_source'. What I want populated is whatever element is matched to the specific column within the df. One way of writing it is:
df["fruit"] = NaN
for index, row in df.iterrows():
for fruit in fruit_list:
if fruit in row['fruit_source']:
df.loc[index,'fruit'] = fruit
else:
df.loc[index,'fruit'] = "fruit not found"
In which the dataframe is populated with a new column of what fruit the fruit source collected.
When expanding this out to a larger dataframe, though, this iteration can pose to be an issue based on performance. Reason being, as more rows are introduced, the iteration explodes due to iterating through the list as well.
Is there more of an efficient method that can be done?
Upvotes: 4
Views: 2806
Reputation: 120391
Use str.extract
with a regex pattern to avoid a loop:
import re
pattern = fr"({'|'.join(fruit_list)})"
df['fruit'] = df['fruit_source'].str.extract(pattern, flags=re.IGNORECASE) \
.fillna('fruit not found')
Output:
>>> df
fruit_source value fruit
0 Apple farm 10 Apple
1 Banana field 15 Banana
2 Coconut beach 14 Coconut
3 corn field 10 fruit not found
>>> pattern
'(apple|banana|coconut)'
Upvotes: 5
Reputation: 168834
You can let Pandas do the work like so:
# Prime series with the "fruit not found" value
df['fruit'] = "fruit not found"
for fruit in fruit_list:
# Generate boolean series of rows matching the fruit
mask = df['fruit_source'].str.contains(fruit, case=False)
# Replace those rows in-place with the name of the fruit
df['fruit'].mask(mask, fruit, inplace=True)
print(df)
will then say
fruit_source value fruit
0 Apple farm 10 apple
1 Banana field 15 banana
2 Coconut beach 14 coconut
3 corn field 10 fruit not found
Upvotes: 6