msmazh
msmazh

Reputation: 905

Pandas column of list: How to set the dtype of items

I have a dataframe which has multiple columns containing lists and the length of the lists in each row are different:

tweetid tweet_date    user_mentions       hashtags
00112   11-02-2014    []                  []
00113   11-02-2014    [00113]             [obama, trump]
00114   30-07-2015    [00114, 00115]      [hillary, trump, sanders]
00115   30-07-2015    []                  []

The dataframe is a concat of three different dataframes and I'm not sure whether the items in the lists are of the same dtype. For example, in the user_mentions column, sometime the data is like:

[00114, 00115]

But sometimes is like this:

['00114','00115'] 

How can I set the dtype for the items in the lists?

Upvotes: 4

Views: 9290

Answers (3)

Francisco
Francisco

Reputation: 503

df['user_mentions'].map(lambda x: ['00' + str(y) if isinstance(y,int) else y for y in x]) If your objective is to convert all user_mentions to str the above might help. I would also look into this post for unnesting. As mentioned ; pandas not really designed to house lists as values.

Upvotes: 2

d_kennetz
d_kennetz

Reputation: 5359

Pandas DataFrames are not really designed to house lists as row/column values, so this is why you are facing difficulty. you could do

python3.x:

df['user_mentions'].apply(lambda x: list(map(int, x)))

python2.x:

df['user_mentions'].apply(lambda x: map(int, x))

In python3 when mapping a map object is returned so you have to convert to list, in python2 this does not happen so you don't explicitly call it a list.

In the above lambda, x is your row list and you are mapping the values to int.

Upvotes: 8

bravosierra99
bravosierra99

Reputation: 1371

this should work, where I'm making the first columns lists contain strings

df[0].apply((lambda x: [str(y) for y in x]))

Upvotes: 1

Related Questions