mfcabrera
mfcabrera

Reputation: 781

How to replace specific entries of a Numpy array based on its content

So let's say I have a simple matrix made out of ndarrays (just an example of how part of the data might look like):

import numpy as np
a = np.asarray([['1.0', 'Miami'],
   ['2.0', 'Boston'],
   ['1.4', 'Miami']]) 

I want to do data analysis in this complex data set ;) - I want to transform 'Miami' in 0 and Boston in 1 in order to use a really fancy ML algorithm.

What is a good way to accomplish this in Python?
(I am not asking for the obvious one of iterating and using a dictionary / if sentence to replace the entry) but more if there's a better way using NumPy or native Python to do this.

Upvotes: 1

Views: 141

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375435

pandas is a good tool for this.
First convert the array to a DataFrame:

In [11]: import pandas as pd

In [12]: df = pd.DataFrame(a, columns=['value', 'city'])

and then replace entries from the city column:

In [13]: df.city = df.city.replace({'Miami': 0, 'Boston': 1})

In [14]: df
Out[14]:
  value city
0   1.0    0
1   2.0    1
2   1.4    0

Upvotes: 2

Related Questions