Select only integers from a column of mixed data types in pandas

Question

I have a dataframe df as shown below. The column col2 has null values, blank values, integers and even float values. I want to derive a new dataframe new_df from df where the column col2 has only integer values.

import pandas as pd
import numpy as np

col1 = ["a", "b", "c", "d", "e", "f", "g", "h"]
col2 = ["25.45", "", "200", np.nan, "N/A", "null", "35", "5,300"]

df = pd.DataFrame({"col1": col1, "col2": col2})

This is how df looks:

  col1   col2
0    a  25.45
1    b       
2    c    200
3    d    NaN
4    e    N/A
5    f   null
6    g     35
7    h  5,300

Below is my desired output for new_df where the column col2 values are only integers:

  col1   col2  
2    c    200
6    g     35

I have tried using pd.to_numeric() and even isdigit() functions but they are expecting a series as input. Is there an easy way to get the desired output?

cs95 · Accepted Answer

`str.isdigit`

Filter out digits and select by boolean indexing:

df2 = df[df.col2.astype(str).str.isdigit()]    
print(df2)
  col1 col2
2    c  200
6    g   35

P.S., to convert "col2" to integer, use

df2['col2'] = df2['col2'].astype(int)

`str.contains`

You could also use str.contains, albeit slower, since it uses regex.

df[df.col2.astype(str).str.contains(r'^\d+$')]

  col1 col2
2    c  200
6    g   35

`pd.to_numeric`

A third solution is somewhat hacky, but uses pd.to_numeric. We need one pre-replace step to filter out floats.

v = df.col2.astype(str).str.replace('.', '|', regex=False)
df[pd.to_numeric(v, errors='coerce').notna()]

  col1 col2
2    c  200
6    g   35

Select only integers from a column of mixed data types in pandas

Answers (2)

`str.isdigit`

`str.contains`

`pd.to_numeric`

Related Questions