x89
x89

Reputation: 3470

String to Float in DataFrame's Column

The 5th column of my DataFrame is a list of floats. I want to replace the list with the maximum value from the list. How could I do so?

I'm trying this but I get an error:

import pandas as pd
import numpy as np

colNames = ['unixTime', 'sampleAmount','Time','samplingRate', 'Data']

data = pd.read_csv("project_fan.csv",  sep = ';', error_bad_lines = False, names = colNames) 
print(data.head())
data['Data'] = [float(x) for x in data.Data.values]
data['Data'] = [np.array(x).mean()for x in data.Data.values]
Traceback (most recent call last):
  File "new.py", line 9, in <module>
    data['Data'] = [float(x) for x in data.Data.values]
ValueError: could not convert string to float: [1618.6294555664062, 1619.0826416015625, 1620.0897216796875, 1620.0393676757812, 1620.0393676757812, 1620.240783691406, 1620.391845703125, 1620.0897216796875, 1619.435119628906, 1620.4925537109373, 16

Also tried to use astype(float).mean but doesn't work.

Sample DataFrame:

       unixTime  sampleAmount  Time  samplingRate   Data
0  1.556891e+09         16384   340  48188.235294  [1618.6294555664062,1619.0826416015625,1620.489622]
1  1.556891e+09         16384   341  48046.920821  [1619.78759765625,1619.0826416015625,1620.49754]

Upvotes: 0

Views: 69

Answers (1)

filbranden
filbranden

Reputation: 8898

From your error message, it's clear that the data in your "Data" column is stored as a string containing what looks like a Python representation of a list of floats. Which is natural, considering that is coming from a CSV file, which can't otherwise represent a list of numbers in a single column.

You can check that with type(data.Data[0]), which I expect will tell you str.

Since it looks like a Python representation of a list of floats, one good way is to use Python's module to evaluate a Python literal, which you can do with the ast.literal_eval() function. That function is able to interpret Python basic types (integers, floats, strings, lists, tuples, dicts) and it's a safe way to parse contents coming from an external source such as a CSV file.

So you can convert it to an actual list of floats with:

import ast
data['Data'] = data.Data.transform(ast.literal_eval)

Another approach is to claim that this column contains JSON-encoded data and parse it as JSON instead. It turns out in this case, for a list of floats, both the Python and the JSON representation are equivalent, so either method should work. (It's possible the JSON decoding will be faster, JSON is generally simpler than the general Python literal syntax.)

To decode it as JSON (alternative to the above):

import json
data['Data'] = data.Data.transform(json.loads)

At this point (after either Python or JSON conversion), you can use functions such as np.mean on the result, since it's just a list of floats and no longer a string:

data['Data'] = data.Data.apply(np.mean)

Upvotes: 3

Related Questions