Monirrad
Monirrad

Reputation: 483

How to filter based on the some part of a value of rows in a pandas dataframe

I have a pandas dataframe as below:

   +------+---+
   |Name  |age|
   +------+---+
   |Mona  |12 |
   +------+---+
   |Monika|25 |
   +------+---+
   |Tomas |3  |
   +------+---+
   |Ilyas |47 |
   +------+---+

Now, I want to assign some values in a new column to the Name column contains some substring. For example if it contains Mon then we assign text to them and if it contains as we assign city to them.

So the output must be:

   +------+---+----+
   |Name  |age|new |
   +------+---+----+
   |Mona  |12 |text|
   +------+---+----+
   |Monika|25 |text|
   +------+---+----+
   |Tomas |3  |city|
   +------+---+----|
   |Ilyas |47 |city|
   +------+---+----+

I wrote the following codes and it didn't work:

df['new'] = np.nan    
df['new'] = df['new'].mask( 'Mon' in df['Name'], 'text')    
df['new'] = df['new'].mask( 'as' in df['Name'], 'city')

Upvotes: 0

Views: 697

Answers (4)

kofffii
kofffii

Reputation: 21

Df.loc[df.name.str.endswith('as'), 'new'] = 'city'
Df.loc[df.name.str.startswith('Mon'), 'new'] = 'text'

Str has a method startsWith and endsWith for that kind of problems. There's probably the way to do it in a single line but right now nothing comes to my mind.

Loc is always helpful with selecting and updating data on condition.

Edit: Juan C is right it should be 'contains' not 'startsWith', my bad.

Upvotes: 0

gold_cy
gold_cy

Reputation: 14216

I think this would be your best bet, since it will handle case-insensitive matches and fill NaN where neither condition is met.

import numpy as np

     Name  Age
0    Mona   12
1  Monika   25
2   Tomas    3
3   Ilyas   47

default = np.where(df.Name.str.contains('as', case=False), 'city', np.nan)
cond = np.where(df.Name.str.contains('Mon', case=False), 'text', default)

df.assign(new=cond)

     Name  Age   new
0    Mona   12  text
1  Monika   25  text
2   Tomas    3  city
3   Ilyas   47  city

Upvotes: 0

PMende
PMende

Reputation: 5460

If your logic is going to get complicated, you may want to encapsulate your mapping into a function:

def map_func(name):
    if 'Mon' in name:
        return 'text'
    elif 'as' in name:
        return 'city'

df['Name'].map(map_func)

Result:

0    text
1    text
2    city
3    city
Name: Name, dtype: object

Upvotes: 1

Juan C
Juan C

Reputation: 6132

This should do:

df['new']=np.where(df.Name.str.contains('Mon'), 'text', 'city')

np.where works like Excel's IF : if the condition is met, fill with 'text', else fill with 'city'.

Upvotes: 0

Related Questions