Reputation: 1499
I have 2 data frames with 25 columns. I am trying to get the distributions for each column in both data frames, for a comparative study.
I do something like this:
count1=df1[col].value_counts().reset_index()
count2=df2[col].value_counts().reset_index()
merged=count1.merge(count2,how='outer',on='index')
Some columns have a list instead of string. I want to convert them to string and then do the above steps.
df1[col+'_str']=df1[col].str.join(' ')
df2[col+'_str']=df2[col].str.join(' ')
Now, the problem is that I don't know which columns will have list. Is there a way to find if the contents of a column has list/string?
I tried this:
if((type(df1[col].iloc[0])=='list' )):
But, some of those columns without a value in 0th row, will bypass this test!
How can I find out the type of contents in a dataframe column?
I referred to this SO question, but couldn't use much: SO question
Upvotes: 2
Views: 1460
Reputation: 303
If you want to know if any of the values from the column has a list, you can use the any method on the boolean series returned by the is_list_like function
from pandas.api.types import is_list_like
df[column].apply(is_list_like).any()
Will return True
if any of the values in the column is a list
Upvotes: 0
Reputation: 7058
you can select the columns with dtype object
(strings, lists, ...)
df_obj = df.select_dtypes(include=[object])
and then try something like:
def myfunction(value):
if isinstance(value, list):
return ' '.join(value)
else:
return value
df_str = df_obj.apply(myfunction)
Upvotes: 3
Reputation: 16251
You can test the first 10 values (for instance) like this:
df1[col].head(10).apply(lambda v: isinstance(v, list)).any()
This will be true if any value in the first 10 is a list.
Upvotes: 4