Reputation: 3470
The 5th column of my DataFrame is a list of floats. I want to replace the list with the maximum value from the list. How could I do so?
I'm trying this but I get an error:
import pandas as pd
import numpy as np
colNames = ['unixTime', 'sampleAmount','Time','samplingRate', 'Data']
data = pd.read_csv("project_fan.csv", sep = ';', error_bad_lines = False, names = colNames)
print(data.head())
data['Data'] = [float(x) for x in data.Data.values]
data['Data'] = [np.array(x).mean()for x in data.Data.values]
Traceback (most recent call last):
File "new.py", line 9, in <module>
data['Data'] = [float(x) for x in data.Data.values]
ValueError: could not convert string to float: [1618.6294555664062, 1619.0826416015625, 1620.0897216796875, 1620.0393676757812, 1620.0393676757812, 1620.240783691406, 1620.391845703125, 1620.0897216796875, 1619.435119628906, 1620.4925537109373, 16
Also tried to use astype(float).mean but doesn't work.
Sample DataFrame:
unixTime sampleAmount Time samplingRate Data
0 1.556891e+09 16384 340 48188.235294 [1618.6294555664062,1619.0826416015625,1620.489622]
1 1.556891e+09 16384 341 48046.920821 [1619.78759765625,1619.0826416015625,1620.49754]
Upvotes: 0
Views: 69
Reputation: 8898
From your error message, it's clear that the data in your "Data" column is stored as a string containing what looks like a Python representation of a list of floats. Which is natural, considering that is coming from a CSV file, which can't otherwise represent a list of numbers in a single column.
You can check that with type(data.Data[0])
, which I expect will tell you str
.
Since it looks like a Python representation of a list of floats, one good way is to use Python's module to evaluate a Python literal, which you can do with the ast.literal_eval()
function. That function is able to interpret Python basic types (integers, floats, strings, lists, tuples, dicts) and it's a safe way to parse contents coming from an external source such as a CSV file.
So you can convert it to an actual list of floats with:
import ast
data['Data'] = data.Data.transform(ast.literal_eval)
Another approach is to claim that this column contains JSON-encoded data and parse it as JSON instead. It turns out in this case, for a list of floats, both the Python and the JSON representation are equivalent, so either method should work. (It's possible the JSON decoding will be faster, JSON is generally simpler than the general Python literal syntax.)
To decode it as JSON (alternative to the above):
import json
data['Data'] = data.Data.transform(json.loads)
At this point (after either Python or JSON conversion), you can use functions such as np.mean
on the result, since it's just a list of floats and no longer a string:
data['Data'] = data.Data.apply(np.mean)
Upvotes: 3