How to find data type error in pandas dataframe?

Question

df1:

  product   product_Id   Price
0 Mobile      G67129     4500
1 Earphone    H56438     8900
2 Heater      K12346     fgdht
3 Kitchen     566578     4500
4 4359        Gh1907     5674
5 plastic     G67129     Dfz67

df2:

  Column_Name   Expected_Dtype
0 product          String
1 product_Id       String
2 Price            int

I need to find out the data type error values from df1 and has column datatype information in df2.

Output:

   column_Name  Value  Exp_dtype index
0  product       4359  String    4
1  product_Id   566578 String    3
2  Price       fgdht    int      2
3  Price       Dfz67    int      5

Sheng Zhuang · Accepted Answer

As those types mixed up, all being object, I can only think of using str match with regex pattern to pick out error types.

Here is my solution:

find rows with error types first

bad_product = df['product'].loc[df['product'].str.match(r'[0-9.]+')]
bad_product_ID = df.product_Id.loc[df['product_Id'].str.match(r'[0-9.]+')]
bad_price = df.Price.loc[~df['Price'].str.match(r'[0-9.]+')]

join error rows all together

df3 = pd.concat([bad_product,bad_product_ID,bad_price], axis=1).stack().reset_index()
df3.columns = ['index', 'Column_Name', 'value']

merge it with df2

df2.set_index('Column_Name')
df3.set_index('Column_Name')
result = pd.merge(df3, df2, how='left')

result:


  index Column_Name value   Expected_Dtype
0   2   Price       fgdht   int
1   3   product_Id  566578  String
2   4   product     4359    String
3   5   Price       Dfz67   int

when you have no idea to begin with, try to break it down to small task. Hope this would help.

How to find data type error in pandas dataframe?

Answers (2)

Related Questions