Reputation: 7644
I have an excel file which I'm importing as a pandas dataframe.
My dataframe df
:
id name value
1 abc 22.3
2 asd 11.9
3 asw 2.4
I have a dictionary d
in format:
{ 'name' : 'str',
'value' : 'float64',
'id' : 'int64'}
I want to check whether the data types of the columns in my dataframe is the same as defined in the dictionary.
Output can be just a string like, if all the columns have their respective data type,
print("Success")
else:
print(" column id has different data type.Please check your file)"
Upvotes: 4
Views: 6327
Reputation: 862641
You can convert type
of first value in column to str
and then compare:
d1 = {x: type(df[x].iat[0]).__name__ for x in df.columns}
print (d1)
{'name': 'str', 'id': 'int64', 'value': 'float64'}
print (d == d1)
True
Upvotes: 0
Reputation: 76917
Use
In [5759]: s = df.dtypes == pd.Series(d)
In [5760]: ss = s[~s]
In [5761]: if ss.empty:
...: print('sucess')
...: else:
...: print ('columns %s have different data type' % ss.index.tolist())
...:
...:
columns ['name'] have different data type
Details
In [5763]: df
Out[5763]:
id name value
0 1 abc 22.3
1 2 asd 11.9
2 3 asw 2.4
In [5764]: d
Out[5764]: {'id': 'int64', 'name': 'str', 'value': 'float64'}
Upvotes: 1
Reputation: 402493
Call dtypes
, convert to a dictionary and compare.
d1 = df.dtypes.astype(str).to_dict()
d1
{'id': 'int64', 'name': 'object', 'value': 'float64'}
d1 == {'name' : 'str', 'value' : 'float64', 'id' : 'int64'}
False
Unfortunately, name
is shown to be an object
column, not str
, hence the False
. I could recommend doing a quick iteration over your dict and changing all entries where str
appears to object
(this shouldn't hurt):
d2 = {k : 'object' if v == 'str' else v for k, v in d2.items()}
d2
{'id': 'int64', 'name': 'object', 'value': 'float64'}
d1 == d2
True
To check which column(s) are incorrect, the solution becomes a little more involved, but is still quite easy with a list comprehension.
[k for k in d1 if d1[k] != d2.get(k)]
['name']
Upvotes: 5