PyRar
PyRar

Reputation: 549

Pandas replace the value of a column in dataframe only where if condition is true

I have a problem with replacing values in a pandas dataframe.

I want to search in a column ('URL') of a dataframe if it contains several string.

If this is true, I want to replace the value of another column in the dataframe but on the SAME LINE. If a string is found in a url from the 'URL' column I want to write that string on the same line in the column 'Model' and to write 'Samsung' for example on the column 'Brand'

For the moment, when an if condition for contains is true, it replaces all my values on the other columns, and I don't want that.

Python Code:

import pandas as pd

dataframe_initial = pd.DataFrame()
dataframe_initial = pd.read_excel('tele2.xlsx')
dataframe_initial['Model'] = ""
dataframe_initial['Brand'] = ""

str1 = 'galaxy-S9'
str2 = 'note-9'
str3 = 'galaxy-a6'
str4 = 'Huawei'
str5 = 'P20'
str6 = 'Apple'
str7 = 'Iphone-X'

for url in dataframe_initial['URL']:
    if str1 in url:
        dataframe_initial['Model'] = str(str1)
        dataframe_initial['Brand'] = str('Samsung')
    if str3 in url:
        dataframe_initial['Model'] = str(str3)
        dataframe_initial['Brand'] = str('Samsung')
    if str2 in url:
        dataframe_initial['Model'] = str(str2)
        dataframe_initial['Brand'] = str('Samsung')

Upvotes: 1

Views: 1188

Answers (3)

Mohit Motwani
Mohit Motwani

Reputation: 4792

Give this a try:

def pair(x):
if str1 in x['URL']:
    x['Model'] = str(str1)
    x['Brand'] = str('Samsung')
if str2 in x['URL']:
    x['Model'] = str(str2)
    x['Brand'] = str('Samsung')
if str3 in x['URL']:
    x['Model'] = str(str3)
    x['Brand'] = str('Samsung')

return x

dataframe_initial.apply(pair, axis = 1)

jpp has written a more efficient solution. Here's one more method similar to jpp's solution:

for value in values:
    df['Model'] = np.where(df['URL'].str.contains(value), value, df['Model'])
df['Brand'] = np.where(df['URL'].str.isin(values), 'Samsung', df['Brand'])

Upvotes: 1

Sociopath
Sociopath

Reputation: 13401

You need:

dataframe_initial['Model'] = None
dataframe_initial['Brand'] = None


kw = [str1, str2, str3]
for i in kw:
    dataframe_initial['Model'] = np.where(dataframe_initial['URL'].str.contains(i), i, dataframe_initial['Model'] )

dataframe_initial['Brand'] = np.where(dataframe_initial['Model'].isin(kw), 'Samsung', dataframe_initial['Brand'])

Upvotes: 1

jpp
jpp

Reputation: 164623

First you should avoid creating a variable number of variables. You can use list instead:

values = ['galaxy-S9', 'note-9', 'galaxy-a6', 'Huawei', 'P20', 'Apple', 'Iphone-X']

Next you are iterating rows and, while doing so, updating an entire series each time you iterate a row. This is inefficient and incorrect. A better idea is to iterate your list of values and use Pandas Boolean indexing:

for value in values:
    mask = df['URL'].str.contains(value, regex=False)
    df.loc[mask, 'Model'] = value
    df.loc[mask, 'Brand'] = 'Samsung'

Note you don't need to call str on objects which are already strings.

Upvotes: 5

Related Questions