Reputation: 33
I read excel data as dataframe of pandas in which each row has two non-NaN values(others are all NaN)
names | Unnamed:1 | Unnamed:2 | Unnamed:3 | ~ | Unnamed:19 | Unnamed:20 |
---|---|---|---|---|---|---|
NaN | NaN | 1.3 | NaN | ~(NaN) | 10.4 | NaN |
NaN | NaN | NaN | 2.7 | ~(NaN) | NaN | 12.7 |
~ | ~ | ~ | ~ | ~ | ~ | ~ |
name_ccdd | NaN | 1.3 | NaN | ~(NaN) | 9.3 | NaN |
~ | ~ | ~ | ~ | ~ | ~ | ~ |
name_yyzz | 0.5 | NaN | NaN | ~4.7~ | NaN | NaN |
I'd like to find a specific row containing a word(e.g. "ccdd") and get non-NaN values (e.g. 1.3 and 9.3) in a general way
I made a boolin which contains showing which row index contains a word(e.g. "ccdd")
import pandas as pd
import numpy as np
filename= '~/data.xlsx'
df = pd.read_excel(filename, engine='openpyxl')
mask = df[df.columns[0]].str.contains('ccdd')
print(mask)
0 NaN
1 NaN
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
the 5th row contains data on what I want to get two float values. but I can't go further.
In a previous article(Get row and column index of the value in Pandas df) I found a similar answer but I don't know how to utilize it.
Upvotes: 1
Views: 2114
Reputation: 1348
This will get you the proper row index:
mask=df['names'].str.contains('ccdd')
mask=mask.replace(np.nan, False, regex=True)
list(df.names.index[mask])
Output:
[3]
Upvotes: 1
Reputation: 24314
You can pass na parameter in str.contains()
so basically the na parameter set NaN values to True/False according to your input:
mask = df[df.columns[0]].str.contains('ccdd',na=False)
Now finally pass that mask to your df:
df[mask]
#OR
df.loc[mask]
Upvotes: 2
Reputation: 38
To select a row or group of rows fulfilling a condition, you can use this:
subSelection=df[df["names"].str.contains("ccdd")]
To print the rows, use can use print(subSelection)
and to print the indices of the selected rows, you can use print(list(subSelection.index))
Upvotes: 1