Terence Chow
Terence Chow

Reputation: 11153

replace blanks in numpy array

The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.

I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.

However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?

Here is my code:

# my entire data set is stored in a numpy array defined as x

ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30 

For whatever reason when I'm done with the above code, I type x to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank] because that list will include some female data points since the female data points are not yet excluded.

Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1 (setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30 does not work.

sample of array:

#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
   ['1', '0', '38', ..., '0', '71.2833', '0'],
   ['3', '0', '26', ..., '0', '7.925', '2'],
   ..., 
   ['3', '0', '', ..., '2', '23.45', '2'],
   ['1', '1', '26', ..., '0', '30', '0'],
   ['3', '1', '32', ..., '0', '7.75', '1']], 
  dtype='<U82')

#output from typing x[0]

array(['3', '1', '22', '1', '0', '7.25', '2'], 
  dtype='<U82')

Note that I have changed column 2 to be 0 for female and 1 for male already in the above output

Upvotes: 4

Views: 3258

Answers (3)

user1301404
user1301404

Reputation:

You can use the where function:

arr = array([['3', '1', '22', '1', '0', '7.25', '2'], 
            ['3', '', '22', '1', '0', '7.25', '2']], 
           dtype='<U82')

blank = np.where(arr=='')

arr[blank] = 20

array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
       [u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']], 
      dtype='<U82')

If you want to change a specific column you can do the do the following:

male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30

female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28

Upvotes: 2

Akavall
Akavall

Reputation: 86128

How about this:

my_data =  np.array([['3', '1', '22', '0', '7.25', '2'],
                     ['1', '0', '38', '0', '71.2833', '0'],
                     ['3', '0', '26', '0', '7.925', '2'],
                     ['3', '0', '', '2', '23.45', '2'],
                     ['1', '1', '26', '0', '30', '0'],
                     ['3', '1', '32', '0', '7.75', '1']], 
                     dtype='<U82')

ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

Result:

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
       [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
       [u'3', u'0', u'26', u'0', u'7.925', u'2'],
       [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
       [u'1', u'1', u'26', u'0', u'30', u'0'],
       [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
      dtype='<U82')

Upvotes: 3

ASGM
ASGM

Reputation: 11381

You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.

for row in range(len(x)):
    if row[2] == '':
        if row[1] == 1:
            row[2] == 30
        else:
            row[2] == 28

Upvotes: 0

Related Questions