user1821176
user1821176

Reputation: 1191

Replacing empty or missing values with zeros in a large array

I have an large array of more than 40000 elements

a = ['15', '12', '', 18909, ...., '8989', '', '90789', '8']

I'm looking for a simply way to replace the empty '' values to '0' so that I can manipulate the data in the array using Numpy.

I would then convert the elements in my array into integers using

a = map(int, a)

so that I could find the mean of the array in numpy

a_mean = np.mean(a)

My issue is that I cannot convert to integers in an array with missing numbers to get a mean.

Upvotes: 3

Views: 3352

Answers (4)

Bas Swinckels
Bas Swinckels

Reputation: 18488

You could make a small function that converts a single value exactly how you want it, e.g.:

def to_int(x):
    try:
        return int(x)
    except ValueError:
        return 0

which can be used with map:

In [22]: a = ['15', '12', '', 18909, '8989', '90789', '8']

map(to_int, a)
Out[23]: [15, 12, 0, 18909, 8989, 90789, 8]

in a list comprehension:

In [25]: np.array([to_int(x) for x in a])
Out[25]: array([   15,    12,     0, 18909,  8989, 90789,     8])

or in a generator expression to directly create a numpy array:

In [27]: np.fromiter((to_int(x) for x in a), dtype=int)
Out[27]: array([   15,    12,     0, 18909,  8989, 90789,     8])

Upvotes: 5

Karn Kumar
Karn Kumar

Reputation: 8816

From the previous learning with SO, i see you can impy the below solution to convert the NaN to zeros..

from numpy import *

a = array([[0, 1, 2], [3, 4, NaN]])
where_are_NaNs = isnan(a)
a[where_are_NaNs] = 0

secondly, nan_to_num() as i said earlier in my comment.

>>> import numpy as np
>>> a = array([[0, 1, 2], [3, 4, np.NaN]])
>>> a
array([[  0.,   1.,   2.],
       [  3.,   4.,  nan]])
>>> a = np.nan_to_num(a)
>>> a
array([[ 0.,  1.,  2.],
       [ 3.,  4.,  0.]])

Upvotes: 1

MercyDude
MercyDude

Reputation: 914

If I understood you right so it should look like that:

for index in range(len(a)):
    if a[i] is '':
       a[i] = '0'

You can also use:

a = list(map(lambda x: '0' if x == '' else x, a))

Upvotes: 2

bunbun
bunbun

Reputation: 2676

A more verbose answer is:

acc = 0
for v in a:
    acc+=int(v or 0)
a_mean = acc/len(a)

Upvotes: 0

Related Questions