Reputation: 1499
Suppose I have a DataFrame, in which one of the columns (we'll call it 'power') holds integer values from 1 to 10000. I would like to produce a numpy array which has, for each row, a value indicating whether the corresponding row of the DataFrame has a value in the 'power' column which is greater than 9000.
I could do something like this:
def categorize(frame):
return np.array(frame['power']>9000)
This will give me a boolean array which can be tested against with True and False. However, suppose I want the contents of the array to be 1 and -1, rather than True and False. How can I accomplish this without having to iterate through each row in the frame?
For background, the application is preparing data for binary classification via machine learning with scikit-learn.
Upvotes: 1
Views: 2096
Reputation: 76336
You can use np.where
for this type of stuff.
Consider the following:
import pandas as pd
df = pd.DataFrame({
'a': range(20)})
df['even'] = df.a % 2 == 0
So now even
is a boolean column. To create an array the way you like, you can use
np.where(df.even, 1, -1)
You can assign this back to the DataFrame, if you like:
df['foo'] = np.where(df.even, 1, -1)
See the pandas
cookbook further for this sort of stuff.
Upvotes: 2