Reputation: 25
I have a list of dictionaries with values that are returned as numpy arrays (and which are often empty).
data=[{'width': array([])},
{'width': array([])},
{'width': array([])},
{'width': array([])},
{'width': array([])},
{'width': array([ 0.64848222])},
{'width': array([ 0.62241745])},
{'width': array([ 0.76892571])},
{'width': array([ 0.69913647])},
{'width': array([ 0.7506934])},
{'width': array([ 0.69087949])},
{'width': array([ 0.65302866])},
{'width': array([ 0.67267989])},
{'width': array([ 0.63862089])}]
I would like to create a DataFame were the values are floats and not of numpy array dtype. Also I'd like to the empty arrays to be converted to NaN values.
I have tried using df=pd.DataFrame(data, dtype=float)
which returns a DataFame with values as np.arrays as such:
width
0 []
1 []
2 []
3 []
4 []
5 [0.648482224582]
6 [0.622417447245]
7 [0.768925710479]
8 [0.699136467373]
9 [0.75069339816]
10 [0.690879488242]
11 [0.653028655088]
12 [0.672679885077]
13 [0.638620890633]
I've also tried recasting the df's values after creating it using df.values.astype(float)
but get the following error:
ValueError: setting an array element with a sequence.
The final output I am trying to get for the DataFame looks like:
width
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 0.648482224582
6 0.622417447245
7 0.768925710479
8 0.699136467373
9 0.75069339816
10 0.690879488242
11 0.653028655088
12 0.672679885077
13 0.638620890633
Upvotes: 1
Views: 497
Reputation: 176820
After you've constructed the DataFrame from data
, the only extra thing you need to do is:
df.width = df.width.str[0]
This works because we're just using the .str
accessor to get the first element of each list. Empty lists don't have a first element so NaN
is returned for those rows.
You end up with a column of float64 values:
width
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 0.648482
6 0.622417
7 0.768926
8 0.699136
9 0.750693
10 0.690879
11 0.653029
12 0.672680
13 0.638621
Note: if you want to display more decimal places, you'll need to adjust the float precision using pd.set_options
.
Alternatively, you can process the list before you construct the DataFrame:
pd.DataFrame([x.get('width') for x in data], columns=['width'])
Upvotes: 1
Reputation: 109546
You can use a list comprehension to extract the data from the array in the dictionary. d['width'][0]
will extract the first value from the array. if d['width'].shape[0]
will evaluate to False
if the array is empty, in which case None
is inserted.
>>> pd.DataFrame([d['width'][0] if d['width'].shape[0] else None for d in data],
columns=['width'])
width
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 0.648482
6 0.622417
7 0.768926
8 0.699136
9 0.750693
10 0.690879
11 0.653029
12 0.672680
13 0.638621
Upvotes: 1
Reputation: 81604
Try this after getting the dataframe you posted:
def convert(x):
if len(x) == 0:
return np.nan
else:
return x[0]
df['width'] = df['width'].apply(lambda x: convert(x))
Upvotes: 0