Shubham R
Shubham R

Reputation: 7644

Check for datatypes of columns in pandas

I have an excel file which I'm importing as a pandas dataframe.

My dataframe df:

id    name    value
1      abc     22.3
2      asd     11.9
3      asw     2.4

I have a dictionary d in format:

{ 'name' : 'str',
  'value' : 'float64',
  'id' : 'int64'}

I want to check whether the data types of the columns in my dataframe is the same as defined in the dictionary.

Output can be just a string like, if all the columns have their respective data type,

print("Success")
else:
    print(" column id has different data type.Please check your file)"

Upvotes: 4

Views: 6327

Answers (3)

jezrael
jezrael

Reputation: 862641

You can convert type of first value in column to str and then compare:

d1 = {x: type(df[x].iat[0]).__name__ for x in df.columns}
print (d1)
{'name': 'str', 'id': 'int64', 'value': 'float64'}

print (d == d1)
True

Upvotes: 0

Zero
Zero

Reputation: 76917

Use

In [5759]: s = df.dtypes == pd.Series(d)

In [5760]: ss = s[~s]

In [5761]: if ss.empty:
      ...:     print('sucess')
      ...: else:
      ...:     print ('columns %s have different data type' % ss.index.tolist())
      ...:
      ...:
columns ['name'] have different data type

Details

In [5763]: df
Out[5763]:
   id name  value
0   1  abc   22.3
1   2  asd   11.9
2   3  asw    2.4

In [5764]: d
Out[5764]: {'id': 'int64', 'name': 'str', 'value': 'float64'}

Upvotes: 1

cs95
cs95

Reputation: 402493

Call dtypes, convert to a dictionary and compare.

d1 = df.dtypes.astype(str).to_dict()

d1
{'id': 'int64', 'name': 'object', 'value': 'float64'}

d1 == {'name' : 'str', 'value' : 'float64', 'id' : 'int64'}
False 

Unfortunately, name is shown to be an object column, not str, hence the False. I could recommend doing a quick iteration over your dict and changing all entries where str appears to object (this shouldn't hurt):

d2 = {k : 'object' if v == 'str' else v for k, v in d2.items()}

d2
{'id': 'int64', 'name': 'object', 'value': 'float64'}

d1 == d2
True

To check which column(s) are incorrect, the solution becomes a little more involved, but is still quite easy with a list comprehension.

[k for k in d1 if d1[k] != d2.get(k)] 
['name']

Upvotes: 5

Related Questions