Jon Heston
Jon Heston

Reputation: 41

Select rows from Pandas dataframe where a specific column contains numbers

I have a data frame where a column (column B) can contain a letters, a number or nothing at all. Lets say the data frame is:

A   B    C
1   2    Dog
3   C    Bird
30  nan  Cat
11  4.1  Wolf

And I want to get rows conditionally, based on whether there is a number in column B:

A   B    C
1   2    Dog
11  4.1  Wolf

I have found that I can limit the dataframe to only rows that contain values by entering df.loc[df["B"].notnull()]. What I'm trying to find out is whether or not there is an equivalent version of .notnull() that can select only rows where column B contains a number?

Upvotes: 2

Views: 6132

Answers (1)

charlesreid1
charlesreid1

Reputation: 4821

To find integers and decimal numbers, define a function that will take a string as an input, attempt to convert a value to a floating point number (which will succeed if you have an integer or a floating point number), and will handle possible errors: a ValueError is raised if you pass it a string that can't be converted to a floating point number, and a TypeError is raised if a null value is passed to float(), so handle these two exceptions:

def safe_float_convert(x):
    try:
        float(x)
        return True # numeric, success!
    except ValueError:
        return False # not numeric
    except TypeError:
        return False # null type

Now use map() to map the new function to column B of the dataframe, element-wise, and create a boolean mask:

mask = df['B'].map(safe_float_convert)

and use the .loc[] function, passing in the boolean mask:

numeric_df = df.loc[mask]

Upvotes: 3

Related Questions