Reputation: 63
How can I use the str.contains() method to check a column if it contains specific strings and assign a value if true in a different column? Essentially, I'm trying to mimic a CASE WHEN LIKE THEN syntax in SQL but in pandas. Really new to python and pandas and would appreciate any help! Essentially, I want to search 'Source' for either video, audio, default, and if found, then Type would be video, audio, default accordingly. I hope this makes sense!
Source Type
video1393x2352_high video
audiowefxwrwf_low audio
default2325_none default
23234_audio audio
Upvotes: 0
Views: 2742
Reputation: 5121
Use the str.extract method ... takes a regular expression as an argument ... returns matched group as a string ...
df['Type'] = df.Source.str.extract('(video|audio|default)')
For some case sensitivity you could add ...
df['Type'] = df.Source.str.lower().str.extract('(video|audio|default)')
Example, including a non match follows ...
In [24]: %paste
import pandas as pd
data = """
Source
video1393x2352_high
audiowefxwrwf_low
default2325_none
23234_audio
complete_crap
AUDIO_upper_case_test"""
from StringIO import StringIO # import from io for python 3
df = pd.read_csv(StringIO(data), header=0, index_col=None)
df['Type'] = df.Source.str.lower().str.extract('(video|audio|default)')
## -- End pasted text --
In [25]: df
Out[25]:
Source Type
0 video1393x2352_high video
1 audiowefxwrwf_low audio
2 default2325_none default
3 23234_audio audio
4 complete_crap NaN
5 AUDIO_upper_case_test audio
Upvotes: 4
Reputation: 1468
Try something like:-
import re
input_values = ['video1393x2352_high', 'audiowefxwrwf_low', 'default2325_none', '23234_audio']
pattern = re.compile('audio|video|default')
res_dict = {}
for input_val in input_values:
type = pattern.findall(input_val)
if type:
res_dict[input_val] = type[0]
print res_dict #{'23234_audio': 'audio', 'audiowefxwrwf_low': 'audio', 'video1393x2352_high': 'video', 'default2325_none': 'default'}
Upvotes: 0
Reputation: 6606
Try using numpy.where
or pandas.DataFrame.where
. Both take a boolean array and conditionally assign based on that.
In [4]: np.where([True, False, True], 3, 4)
Out[4]: array([3, 4, 3])
http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html
You would construct the boolean array using str.contains, and then pass it to the where method.
Upvotes: 0