gall3on
gall3on

Reputation: 63

Python Assign Value to New Column If Contains() Is True

How can I use the str.contains() method to check a column if it contains specific strings and assign a value if true in a different column? Essentially, I'm trying to mimic a CASE WHEN LIKE THEN syntax in SQL but in pandas. Really new to python and pandas and would appreciate any help! Essentially, I want to search 'Source' for either video, audio, default, and if found, then Type would be video, audio, default accordingly. I hope this makes sense!

Source                 Type
video1393x2352_high    video
audiowefxwrwf_low      audio
default2325_none       default
23234_audio            audio

Upvotes: 0

Views: 2742

Answers (3)

Mark Graph
Mark Graph

Reputation: 5121

Use the str.extract method ... takes a regular expression as an argument ... returns matched group as a string ...

df['Type'] = df.Source.str.extract('(video|audio|default)')

For some case sensitivity you could add ...

df['Type'] = df.Source.str.lower().str.extract('(video|audio|default)')

Example, including a non match follows ...

In [24]: %paste
import pandas as pd

data = """
Source
video1393x2352_high
audiowefxwrwf_low
default2325_none
23234_audio
complete_crap
AUDIO_upper_case_test"""

from StringIO import StringIO # import from io for python 3
df = pd.read_csv(StringIO(data), header=0, index_col=None)

df['Type'] = df.Source.str.lower().str.extract('(video|audio|default)')
## -- End pasted text --

In [25]: df
Out[25]: 
                  Source     Type
0    video1393x2352_high    video
1      audiowefxwrwf_low    audio
2       default2325_none  default
3            23234_audio    audio
4          complete_crap      NaN
5  AUDIO_upper_case_test    audio

Upvotes: 4

Barun Sharma
Barun Sharma

Reputation: 1468

Try something like:-

import re
input_values = ['video1393x2352_high', 'audiowefxwrwf_low', 'default2325_none', '23234_audio']
pattern = re.compile('audio|video|default') 
res_dict = {}
for input_val in input_values:
    type = pattern.findall(input_val)
    if type:
        res_dict[input_val] = type[0]

print res_dict #{'23234_audio': 'audio', 'audiowefxwrwf_low': 'audio', 'video1393x2352_high': 'video', 'default2325_none': 'default'}

Upvotes: 0

jwilner
jwilner

Reputation: 6606

Try using numpy.where or pandas.DataFrame.where. Both take a boolean array and conditionally assign based on that.

In [4]: np.where([True, False, True], 3, 4)
Out[4]: array([3, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html

You would construct the boolean array using str.contains, and then pass it to the where method.

Upvotes: 0

Related Questions