Reputation: 5907
Continuing my last question in stack searching matching string pattern from dataframe column in python pandas
Suppose i have a dataframe
name genre
satya |ACTION|DRAMA|IC|
satya |COMEDY|DRAMA|SOCIAL|MUSIC|
abc |DRAMA|ACTION|BIOPIC|
xyz |ACTION||ROMANCE|DARMA|
def |ACTION|SPORT|COMEDY|IC|
ghj |IC|ACTIONDRAMA|NOACTION|
From the answer of my last question , i am able to search any one genre (ex IC) if independently exist in genre column and not as a part of any other genre string value (MUSIC or BIOPIC).
Now i want to find if ACTION And DRAMA both present in a genre column but not necessarily in particular order and as not part of string but individually.
So i need rows in output row[1,3,4]
name genre
satya |ACTION|DRAMA|IC| # both adjacently present
#row 2 will not come # as only DRAMA present not ACTION
abc |DRAMA|ACTION|BIOPIC| ### both adjacently present in diff. order
xyz |ACTION||ROMANCE|DARMA| ### both present not adjacent
##row 5 should not present as DRAMA is not here
## row 6 should not come as both are not present individually(but present as one string part)
I tried something like
x = df[df['gen'].str.contains('\|ACTION\|DRAMA\|')]
### got only Row 1 (ACTION and DRAMA in adjacent and in order ACTION->DRAMA)
Please somebody suggest what can be followed/added here so that i can get what i need here.
Upvotes: 1
Views: 1988
Reputation: 1001
I'm not really sure about this answer because I don't have a compiler here but try using this one.
(\|ACTION|\|DRAMA).*?(\|ACTION|\|DRAMA)
Hope it helps.
Upvotes: 0
Reputation: 862406
I think you can use str.contains
with two conditions with AND - &
:
print df
name genre
0 satya |ACTION|DRAMA|IC|
1 satya |COMEDY|DRAMA|SOCIAL|MUSIC|
2 abc |DRAMA|ACTION|BIOPIC|
3 xyz |ACTION||ROMANCE|DRAMA|
4 def |ACTION|SPORT|COMEDY|IC|
5 ghj |IC|ACTIONDRAMA|NOACTION|
print df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|')
0 True
1 False
2 True
3 True
4 False
5 False
Name: genre, dtype: bool
print df[ df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') ]
name genre
0 satya |ACTION|DRAMA|IC|
2 abc |DRAMA|ACTION|BIOPIC|
3 xyz |ACTION||ROMANCE|DRAMA|
Upvotes: 2