Reputation: 1295
I have a problem with the type of one of my column in a pandas dataframe. Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able to convert it in a list of numbers. Following there is a very simple csv:
ID,LABELS
1,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"
2,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"
If a load it with the function "read_csv" I get a list of strings. I have tried to convert to a list, but I get the list version of a string:
df.LABELS.apply(lambda x: list(x))
returns:
['(','1','.','0',.,.,.,.,.,'4','.','0',')']
Any idea on how to be able to do it?
Thank you.
Upvotes: 29
Views: 68965
Reputation: 43
Sorry I was late to the party. So for other latecomers I got this to work based on the above replies:
df['hashtags'] = df.apply(lambda row: row['hashtags'].strip('[]').replace('"', '').replace(' ', '').split(',') , axis=1)
I loaded a csv with some columns looking like this ...,['hashtag1','hashtag2'],... and the Panda DataFrame loaded it as a string object. I used the above code and it converted to list. I then used "explode" to flatten the data.
Upvotes: 3
Reputation: 12048
Alternatively, you might consider regular expressions:
pattern = re.compile("[0-9]\.[0-9]")
df.LABELS.apply(pattern.findall)
Upvotes: 1
Reputation: 862761
df['LABELS'] = df['LABELS'].str.strip('()').str.split(',')
But if no NaN
s here, list comprehension
working nice too:
df['LABELS'] = [x.strip('()').split(',') for x in df['LABELS']]
Upvotes: 37
Reputation: 16404
You can use ast.literal_eval
, which will give you a tuple:
import ast
df.LABELS = df.LABELS.apply(ast.literal_eval)
If you do want a list, use:
df.LABELS.apply(lambda s: list(ast.literal_eval(s)))
Upvotes: 33
Reputation: 51345
You can try this (assuming your csv
is called filename.csv
):
df = pd.read_csv('filename.csv')
df['LABELS'] = df.LABELS.apply(lambda x: x.strip('()').split(','))
>>> df
ID LABELS
0 1 [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
1 2 [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
Upvotes: 2