Reputation: 331
I want to identify if a column in pandas is a list (in each row).
df=pd.DataFrame({'X': [1, 2, 3], 'Y': [[34],[37,45],[48,50,57]],'Z':['A','B','C']})
df
Out[160]:
X Y Z
0 1 [34] A
1 2 [37, 45] B
2 3 [48, 50, 57] C
df.dtypes
Out[161]:
X int64
Y object
Z object
dtype: object
Since the dtype of strings is "object", I'm unable to distinguish between columns that are strings and lists (of integer or strings).
How do I identify that column "Y" is a list of int?
Upvotes: 15
Views: 10460
Reputation: 862741
You can use map
(or applymap
for pandas versions prior to v2.1.0) to generate the type and then compare to the desired type and then use all
to check if all values are True
:
print (df.map(type))
X Y Z
0 <class 'int'> <class 'list'> <class 'str'>
1 <class 'int'> <class 'list'> <class 'str'>
2 <class 'int'> <class 'list'> <class 'str'>
a = (df.map(type) == list).all()
print (a)
X False
Y True
Z False
dtype: bool
Or:
a = df.map(lambda x: isinstance(x, list)).all()
print (a)
X False
Y True
Z False
dtype: bool
And if need list of columns:
L = a.index[a].tolist()
print (L)
['Y']
If want check dtypes
(but strings
, list
, dict
are object
s):
print (df.dtypes)
X int64
Y object
Z object
dtype: object
a = df.dtypes == 'int64'
print (a)
X True
Y False
Z False
dtype: bool
Upvotes: 17
Reputation: 106
If your dataset is big, you should take a sample before apply the type function, then you can check:
If the the most common type is list:
df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.mode(0)\
.astype(str) == "<class 'list'>"
If all values are list:
(df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.astype(str) == "<class 'list'>")\
.all(0)
If any values are list:
(df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.astype(str) == "<class 'list'>")\
.any(0)
Upvotes: 4