Reputation: 41
I have a data frame where a column (column B) can contain a letters, a number or nothing at all. Lets say the data frame is:
A B C
1 2 Dog
3 C Bird
30 nan Cat
11 4.1 Wolf
And I want to get rows conditionally, based on whether there is a number in column B:
A B C
1 2 Dog
11 4.1 Wolf
I have found that I can limit the dataframe to only rows that contain values by entering df.loc[df["B"].notnull()]
. What I'm trying to find out is whether or not there is an equivalent version of .notnull()
that can select only rows where column B contains a number?
Upvotes: 2
Views: 6132
Reputation: 4821
To find integers and decimal numbers, define a function that will take a string as an input, attempt to convert a value to a floating point number (which will succeed if you have an integer or a floating point number), and will handle possible errors: a ValueError
is raised if you pass it a string that can't be converted to a floating point number, and a TypeError
is raised if a null value is passed to float()
, so handle these two exceptions:
def safe_float_convert(x):
try:
float(x)
return True # numeric, success!
except ValueError:
return False # not numeric
except TypeError:
return False # null type
Now use map()
to map the new function to column B of the dataframe, element-wise, and create a boolean mask:
mask = df['B'].map(safe_float_convert)
and use the .loc[]
function, passing in the boolean mask:
numeric_df = df.loc[mask]
Upvotes: 3