Slater Victoroff
Slater Victoroff

Reputation: 21914

Converting list of strings to list of floats in pandas

I have what I assumed would be a super basic problem, but I'm unable to find a solution. The short is that I have a column in a csv that is a list of numbers. This csv that was generated by pandas with to_csv. When trying to read it back in with read_csv it automatically converts this list of numbers into a string.

When then trying to use it I obviously get errors. When I try using the to_numeric function I get errors as well because it is a list, not a single number.

Is there any way to solve this? Posting code below for form, but probably not extremely helpful:

def write_func(dataset):
    features = featurize_list(dataset[column])  # Returns numpy array
    new_dataset = dataset.copy()  # Don't want to modify the underlying dataframe
    new_dataset['Text'] = features
    new_dataset.rename(columns={'Text': 'Features'}, inplace=True)
    write(new_dataset, dataset_name)

def write(new_dataset, dataset_name):
    dump_location = feature_set_location(dataset_name, self)
    featurized_dataset.to_csv(dump_location)

def read_func(read_location):
    df = pd.read_csv(read_location)
    df['Features'] = df['Features'].apply(pd.to_numeric)

The Features column is the one in question. When I attempt to run the apply currently in read_func I get this error:

ValueError: Unable to parse string "[0.019636873200000002, 0.10695576670000001,...]" at position 0

I can't be the first person to run into this issue, is there some way to handle this at read/write time?

Upvotes: 2

Views: 3654

Answers (2)

Mohammad Akhtar
Mohammad Akhtar

Reputation: 118

I have modified your last function a bit and it works fine.

def read_func(read_location):
    df = pd.read_csv(read_location)
    df['Features'] = df['Features'].apply(lambda x : pd.to_numeric(x))

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

You want to use literal_eval as a converter passed to pd.read_csv. Below is an example of how that works.

from ast import literal_eval
form io import StringIO
import pandas as pd

txt = """col1|col2
a|[1,2,3]
b|[4,5,6]"""

df = pd.read_csv(StringIO(txt), sep='|', converters=dict(col2=literal_eval))
print(df)

  col1       col2
0    a  [1, 2, 3]
1    b  [4, 5, 6]

Upvotes: 2

Related Questions