Satya
Satya

Reputation: 5907

searching a string pattern from a Data-frame column in pandas

Continuing my last question in stack searching matching string pattern from dataframe column in python pandas

Suppose i have a dataframe

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|DRAMA|SOCIAL|MUSIC|
 abc        |DRAMA|ACTION|BIOPIC|
 xyz        |ACTION||ROMANCE|DARMA|
 def        |ACTION|SPORT|COMEDY|IC|
 ghj        |IC|ACTIONDRAMA|NOACTION|

From the answer of my last question , i am able to search any one genre (ex IC) if independently exist in genre column and not as a part of any other genre string value (MUSIC or BIOPIC).

Now i want to find if ACTION And DRAMA both present in a genre column but not necessarily in particular order and as not part of string but individually.

So i need rows in output row[1,3,4]

 name         genre
 satya      |ACTION|DRAMA|IC|   # both adjacently present
 #row 2 will not come           # as only DRAMA present not ACTION
 abc        |DRAMA|ACTION|BIOPIC|   ### both adjacently present in diff. order
 xyz        |ACTION||ROMANCE|DARMA|   ### both present not adjacent
 ##row  5 should not present as DRAMA is not here
 ## row 6 should not come as both are not present individually(but present as one string part)

I tried something like

 x = df[df['gen'].str.contains('\|ACTION\|DRAMA\|')]
 ### got only Row  1 (ACTION and DRAMA in adjacent and in order ACTION->DRAMA)

Please somebody suggest what can be followed/added here so that i can get what i need here.

Upvotes: 1

Views: 1988

Answers (2)

JanLeeYu
JanLeeYu

Reputation: 1001

I'm not really sure about this answer because I don't have a compiler here but try using this one.

(\|ACTION|\|DRAMA).*?(\|ACTION|\|DRAMA)

Hope it helps.

Upvotes: 0

jezrael
jezrael

Reputation: 862406

I think you can use str.contains with two conditions with AND - &:

print df
    name                        genre
0  satya            |ACTION|DRAMA|IC|
1  satya  |COMEDY|DRAMA|SOCIAL|MUSIC|
2    abc        |DRAMA|ACTION|BIOPIC|
3    xyz      |ACTION||ROMANCE|DRAMA|
4    def     |ACTION|SPORT|COMEDY|IC|
5    ghj    |IC|ACTIONDRAMA|NOACTION|

print df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') 
0     True
1    False
2     True
3     True
4    False
5    False
Name: genre, dtype: bool

print df[ df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') ]
    name                    genre
0  satya        |ACTION|DRAMA|IC|
2    abc    |DRAMA|ACTION|BIOPIC|
3    xyz  |ACTION||ROMANCE|DRAMA|

Upvotes: 2

Related Questions