Antonio Serrano
Antonio Serrano

Reputation: 942

Python Pandas: Generate dummy variable from numeric variable according to a threshold

The goal is to create a new column from df with a 1 if the value from column '% Renewable' is at or above the median, and a 0 if the value is below the median.

df = pd.DataFrame({'% Renewable': [np.NaN, 12, np.NaN, 11, 17, 62, 18, 15, np.NaN, 2, np.NaN, np.NaN, 6, np.NaN, 70]},
index=['China', 'United States', 'Japan', 'United Kingdom', 'Russian Federation', 'Canada', 'Germany', 'India', 'France', 'South Korea', 'Italy', 'Spain', 'Iran', 'Australia', 'Brazil'])

I got the median:

median = df['% Renewable'].median()

But now what? Should I use get_dummies function? Or cut perhaps?

Upvotes: 1

Views: 1746

Answers (1)

Guido
Guido

Reputation: 6752

This should do the trick:

df['new_column'] = df['% Renewable'] >= median

Upvotes: 2

Related Questions