pamplemoose
pamplemoose

Reputation: 25

Creating DataFrame with list of dictionaries with np.array values

I have a list of dictionaries with values that are returned as numpy arrays (and which are often empty).

data=[{'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([ 0.64848222])},
      {'width': array([ 0.62241745])},
      {'width': array([ 0.76892571])},
      {'width': array([ 0.69913647])},
      {'width': array([ 0.7506934])},
      {'width': array([ 0.69087949])},
      {'width': array([ 0.65302866])},
      {'width': array([ 0.67267989])},
      {'width': array([ 0.63862089])}]

I would like to create a DataFame were the values are floats and not of numpy array dtype. Also I'd like to the empty arrays to be converted to NaN values.

I have tried using df=pd.DataFrame(data, dtype=float) which returns a DataFame with values as np.arrays as such:

               width
0                 []
1                 []
2                 []
3                 []
4                 []
5   [0.648482224582]
6   [0.622417447245]
7   [0.768925710479]
8   [0.699136467373]
9    [0.75069339816]
10  [0.690879488242]
11  [0.653028655088]
12  [0.672679885077]
13  [0.638620890633]

I've also tried recasting the df's values after creating it using df.values.astype(float) but get the following error: ValueError: setting an array element with a sequence.

The final output I am trying to get for the DataFame looks like:

               width
0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
5     0.648482224582
6     0.622417447245
7     0.768925710479
8     0.699136467373
9      0.75069339816
10    0.690879488242
11    0.653028655088
12    0.672679885077
13    0.638620890633

Upvotes: 1

Views: 497

Answers (3)

Alex Riley
Alex Riley

Reputation: 176820

After you've constructed the DataFrame from data, the only extra thing you need to do is:

df.width = df.width.str[0]

This works because we're just using the .str accessor to get the first element of each list. Empty lists don't have a first element so NaN is returned for those rows.

You end up with a column of float64 values:

       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Note: if you want to display more decimal places, you'll need to adjust the float precision using pd.set_options.

Alternatively, you can process the list before you construct the DataFrame:

pd.DataFrame([x.get('width') for x in data], columns=['width'])

Upvotes: 1

Alexander
Alexander

Reputation: 109546

You can use a list comprehension to extract the data from the array in the dictionary. d['width'][0] will extract the first value from the array. if d['width'].shape[0] will evaluate to False if the array is empty, in which case None is inserted.

>>> pd.DataFrame([d['width'][0] if d['width'].shape[0] else None for d in data], 
                 columns=['width'])
       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Upvotes: 1

DeepSpace
DeepSpace

Reputation: 81604

Try this after getting the dataframe you posted:

def convert(x):
    if len(x) == 0:
            return np.nan
    else:
        return x[0]

 df['width'] = df['width'].apply(lambda x: convert(x))

Upvotes: 0

Related Questions