Reputation: 905
I have a dataframe which has multiple columns containing lists and the length of the lists in each row are different:
tweetid tweet_date user_mentions hashtags
00112 11-02-2014 [] []
00113 11-02-2014 [00113] [obama, trump]
00114 30-07-2015 [00114, 00115] [hillary, trump, sanders]
00115 30-07-2015 [] []
The dataframe is a concat of three different dataframes and I'm not sure whether the items in the lists are of the same dtype. For example, in the user_mentions column, sometime the data is like:
[00114, 00115]
But sometimes is like this:
['00114','00115']
How can I set the dtype for the items in the lists?
Upvotes: 4
Views: 9290
Reputation: 503
df['user_mentions'].map(lambda x: ['00' + str(y) if isinstance(y,int) else y for y in x])
If your objective is to convert all user_mentions
to str
the above might help. I would also look into this post for unnesting.
As mentioned ; pandas not really designed to house lists as values.
Upvotes: 2
Reputation: 5359
Pandas DataFrames are not really designed to house lists as row/column values, so this is why you are facing difficulty. you could do
python3.x:
df['user_mentions'].apply(lambda x: list(map(int, x)))
python2.x:
df['user_mentions'].apply(lambda x: map(int, x))
In python3 when mapping a map object is returned so you have to convert to list, in python2 this does not happen so you don't explicitly call it a list.
In the above lambda, x is your row list
and you are mapping the values to int
.
Upvotes: 8
Reputation: 1371
this should work, where I'm making the first columns lists contain strings
df[0].apply((lambda x: [str(y) for y in x]))
Upvotes: 1