Dilyana Flower
Dilyana Flower

Reputation: 125

Conditional If Statement: If value contains string then set another column equal to string

I write one python 3 script


I have a column 'original_title', where I have different film titles i.a. all films of Star Wars (+ the name of the episode) and Star Trek (+ the name of the episode). I want to create one column which will show me only 'star trek' (without the name of episode), 'star wars' and 'na'.

This is my code for the new column:

df['Trek_Wars'] = pd.np.where(df.original_title.str.contains("Star Wars"), "star_wars", 
              pd.np.where(df.original_title.str.contains("Star Trek"), "star_trek"))

However, it doesn't work

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-5472b36a2193> in <module>()
      1 df['Trek_Wars'] = pd.np.where(df.original_title.str.contains("Star Wars"), "star_wars",
----> 2                    pd.np.where(df.original_title.str.contains("Star Trek"), "star_trek"))

ValueError: either both or neither of x and y should be given

What should I do?

Upvotes: 1

Views: 5424

Answers (2)

Kranthi Kiran
Kranthi Kiran

Reputation: 128

As in your example both the values i.e "Star Wars" and "Star Trek" contain same number of characters (9), you can just split the string till first 9 letters. But for more finer parsing of that column you will need to find a more better method.

X['Film_Series'] = 0
for ind, row in df.iterrows():
    X['Film_Series'].loc[ind] = X['film_name'].loc[ind].str[:9]

Upvotes: 0

jpp
jpp

Reputation: 164673

I assume you are using Pandas. I am not aware of a pd.np.where method, but there is np.where, which you can use for your task:

df['Trek_Wars'] = np.where(df['original_title'].str.contains('Star Wars'),
                           'star_wars', 'na')

Notice we have to provide values for when the condition is met and for when the condition is not met. For multiple conditions, you can use pd.DataFrame.loc:

# set default value
df['Trek_Wars'] = 'na'

# update according to conditions
df.loc[df['original_title'].str.contains('Star Wars'), 'Trek_Wars'] = 'star_wars'
df.loc[df['original_title'].str.contains('Star Trek'), 'Trek_Wars'] = 'star_trek'

You can simply your logic further with a dictionary mapping:

# map search string to update string
mapping = {'Star Wars': 'star_wars', 'Star Trek': 'star_trek'}

# iterate mapping items
for k, v in mapping.items():
    df.loc[df['original_title'].str.contains(k), 'Trek_Wars'] = v

Upvotes: 4

Related Questions