Reputation: 501
my input:
mydata = (pd.DataFrame({'xyz':['Label','NG-ICV','NG-ICV','NG-ICV','ICV','ICV','ICV','NG-ICV','NG-ICV','Label',],
'foo':['Label','Label','Label','Label','Label','Label','Label','Label','Label','Label'],
'bar':['Label','Label','Label','Label','NG-ICV','NG-ICV','ICV','ICV','NG-ICV','NG-ICV']}))
My necessary code to calculate ratio between two patterns and finding in which column is a biggest ratio:
patNG = "NG-ICV"
pat = "ICV"
x=mydata.apply(lambda x: x.str.contains(patNG, regex=True).sum(), axis=0)
y=mydata.apply(lambda x: x.str.contains(pat, regex=True).sum(), axis=0)
z=x/y
z=z.dropna()
mask=z.index.values.tolist()
Now I want select row with index, within two condition column name from mask
and select only rows where patNG
and pat
is True(where row value equal to 'ICV' or 'NG-ICV'). So I expect for example for column xyz
like this:
xyz
4 NG-ICV
5 NG-ICCV
6 ICV
7 ICV
8 NG-ICV
9 NG-ICV
My code:
mydata.loc[ :,[mask[0] in i for i in mydata.columns] ]
And here I am stuck, how add one more condition where values of row also equal to my two patterns('pat' and 'patNG')?
P.S. I must to use regex and pattern, because my data is not one label such ICV its long string with other labels.
Upvotes: 0
Views: 123
Reputation: 260335
Following your logic, you should keep the initial detection of pattern before you sum. Then apply a mask on the detected columns to select your rows:
X = mydata.apply(lambda x: x.str.contains(patNG, regex=True), axis=0)
Y = mydata.apply(lambda x: x.str.contains(pat, regex=True), axis=0)
x = X.sum()
y = Y.sum()
z = (x/y).dropna()
mask = z.index.values.tolist()
mydata.loc[(X|Y)[mask].all(1)]
output:
xyz foo bar
4 ICV Label NG-ICV
5 ICV Label NG-ICV
6 ICV Label ICV
7 NG-ICV Label ICV
8 NG-ICV Label NG-ICV
Upvotes: 1