Use regex in Python to rule out string

Question

I'm using pandas to clean the data as below:

s3 = pd.DataFrame({'title':["intermediate" ,"Basmati/sadri" ,"temperate japonica" ,"Temperate japonica" , "Japonica" ,"Tropical japonica" ,"Aromatic (basmati/sandri type" , "indica" , "Aus/boro" , "Aus" ,"aus" ,"japonica" , "tropical japnica", "" , "Indica" , "Intermediate type" ]})

s3.title.replace(r".*[Jj]ap(o)?nica$", "japonica" ,inplace=True,regex=True)

s3.title.replace(r"Indica", "indica" ,inplace=True,regex=True)

print s3

And I got:

                        title
0                    intermediate
1                   Basmati/sadri
2                        japonica
3                        japonica
4                        japonica
5                        japonica
6   Aromatic (basmati/sandri type
7                          indica
8                        Aus/boro
9                             Aus
10                            aus
11                       japonica
12                       japonica
13                               
14                         indica
15              Intermediate type

I want to replace string like:

if string not in  ['japonica', "indica"] :
    string = 'others'

But how to do it as regex:

s3.title.replace(r"some regex", "others" ,inplace=True,regex=True)

2Cubed · Accepted Answer

The following should work. It uses three cases, separated by or (|) operators.

a negative lookahead to ensure the title does not start with either japonica or indica, with some other characters required.
an or statement to ensure that if the title does start with japonica or indica, there are other characters afterwards, confirming that the string is not japonica or indica alone.

an empty string.

s3.title.replace(r'^(?!japonica|indica).+$|^(japonica|indica).+$|^$', "others", inplace=True, regex=True)

Use regex in Python to rule out string

Answers (1)

Related Questions