Reputation: 11153
The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.
I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.
However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?
Here is my code:
# my entire data set is stored in a numpy array defined as x
ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30
For whatever reason when I'm done with the above code, I type x
to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank]
because that list will include some female data points since the female data points are not yet excluded.
Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1
(setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30
does not work.
sample of array:
#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
['1', '0', '38', ..., '0', '71.2833', '0'],
['3', '0', '26', ..., '0', '7.925', '2'],
...,
['3', '0', '', ..., '2', '23.45', '2'],
['1', '1', '26', ..., '0', '30', '0'],
['3', '1', '32', ..., '0', '7.75', '1']],
dtype='<U82')
#output from typing x[0]
array(['3', '1', '22', '1', '0', '7.25', '2'],
dtype='<U82')
Note that I have changed column 2 to be 0 for female and 1 for male already in the above output
Upvotes: 4
Views: 3258
Reputation:
You can use the where
function:
arr = array([['3', '1', '22', '1', '0', '7.25', '2'],
['3', '', '22', '1', '0', '7.25', '2']],
dtype='<U82')
blank = np.where(arr=='')
arr[blank] = 20
array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
[u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']],
dtype='<U82')
If you want to change a specific column you can do the do the following:
male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30
female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28
Upvotes: 2
Reputation: 86128
How about this:
my_data = np.array([['3', '1', '22', '0', '7.25', '2'],
['1', '0', '38', '0', '71.2833', '0'],
['3', '0', '26', '0', '7.925', '2'],
['3', '0', '', '2', '23.45', '2'],
['1', '1', '26', '0', '30', '0'],
['3', '1', '32', '0', '7.75', '1']],
dtype='<U82')
ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'
Result:
>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
[u'1', u'0', u'38', u'0', u'71.2833', u'0'],
[u'3', u'0', u'26', u'0', u'7.925', u'2'],
[u'3', u'0', u'30', u'2', u'23.45', u'2'],
[u'1', u'1', u'26', u'0', u'30', u'0'],
[u'3', u'1', u'32', u'0', u'7.75', u'1']],
dtype='<U82')
Upvotes: 3
Reputation: 11381
You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.
for row in range(len(x)):
if row[2] == '':
if row[1] == 1:
row[2] == 30
else:
row[2] == 28
Upvotes: 0