Reputation: 549
I have a problem with replacing values in a pandas dataframe.
I want to search in a column ('URL') of a dataframe if it contains several string.
If this is true, I want to replace the value of another column in the dataframe but on the SAME LINE. If a string is found in a url from the 'URL' column I want to write that string on the same line in the column 'Model' and to write 'Samsung' for example on the column 'Brand'
For the moment, when an if condition for contains is true, it replaces all my values on the other columns, and I don't want that.
Python Code:
import pandas as pd
dataframe_initial = pd.DataFrame()
dataframe_initial = pd.read_excel('tele2.xlsx')
dataframe_initial['Model'] = ""
dataframe_initial['Brand'] = ""
str1 = 'galaxy-S9'
str2 = 'note-9'
str3 = 'galaxy-a6'
str4 = 'Huawei'
str5 = 'P20'
str6 = 'Apple'
str7 = 'Iphone-X'
for url in dataframe_initial['URL']:
if str1 in url:
dataframe_initial['Model'] = str(str1)
dataframe_initial['Brand'] = str('Samsung')
if str3 in url:
dataframe_initial['Model'] = str(str3)
dataframe_initial['Brand'] = str('Samsung')
if str2 in url:
dataframe_initial['Model'] = str(str2)
dataframe_initial['Brand'] = str('Samsung')
Upvotes: 1
Views: 1188
Reputation: 4792
Give this a try:
def pair(x):
if str1 in x['URL']:
x['Model'] = str(str1)
x['Brand'] = str('Samsung')
if str2 in x['URL']:
x['Model'] = str(str2)
x['Brand'] = str('Samsung')
if str3 in x['URL']:
x['Model'] = str(str3)
x['Brand'] = str('Samsung')
return x
dataframe_initial.apply(pair, axis = 1)
jpp has written a more efficient solution. Here's one more method similar to jpp's solution:
for value in values:
df['Model'] = np.where(df['URL'].str.contains(value), value, df['Model'])
df['Brand'] = np.where(df['URL'].str.isin(values), 'Samsung', df['Brand'])
Upvotes: 1
Reputation: 13401
You need:
dataframe_initial['Model'] = None
dataframe_initial['Brand'] = None
kw = [str1, str2, str3]
for i in kw:
dataframe_initial['Model'] = np.where(dataframe_initial['URL'].str.contains(i), i, dataframe_initial['Model'] )
dataframe_initial['Brand'] = np.where(dataframe_initial['Model'].isin(kw), 'Samsung', dataframe_initial['Brand'])
Upvotes: 1
Reputation: 164623
First you should avoid creating a variable number of variables. You can use list
instead:
values = ['galaxy-S9', 'note-9', 'galaxy-a6', 'Huawei', 'P20', 'Apple', 'Iphone-X']
Next you are iterating rows and, while doing so, updating an entire series each time you iterate a row. This is inefficient and incorrect. A better idea is to iterate your list of values and use Pandas Boolean indexing:
for value in values:
mask = df['URL'].str.contains(value, regex=False)
df.loc[mask, 'Model'] = value
df.loc[mask, 'Brand'] = 'Samsung'
Note you don't need to call str
on objects which are already strings.
Upvotes: 5