Reputation: 2096
Didn't find the solution to solve my problem.
In my dataset I have a column with weather events features. I need to convert it to multiple numeric columns-indicators. I search for quick solution
weather = pd.read_csv("weather.csv", parse_dates=[0])
Events column looks like:
id Events
0 Rain
...
1 Rain
...
8 Fog-Rain
9 Rain-Snow
I need to convert it to 4 features:
events = ['Rain','Snow','Fog','Thunderstorm']
Each can take 2 values - 1 or 0.
How can I do it with pandas?
Upvotes: 2
Views: 272
Reputation: 186
str.get_dummies
handles this very cleanly:
import pandas as pd
events_list = ['Rain', 'Rain', 'Fog-Rain', 'Rain-Snow', 'Thunderstorm', 'Fog-Thunderstorm']
weather_df = pd.DataFrame(events_list, columns=['Events'])
print(weather_df)
output:
Events
0 Rain
1 Rain
2 Fog-Rain
3 Rain-Snow
4 Thunderstorm
5 Fog-Thunderstorm
We use str.get_dummies
and join it to the original dataframe:
weather_df = pd.concat([weather_df, weather_df.Events.str.get_dummies(sep='-')], axis=1)
print(weather_df)
output:
Events Fog Rain Snow Thunderstorm
0 Rain 0 1 0 0
1 Rain 0 1 0 0
2 Fog-Rain 1 1 0 0
3 Rain-Snow 0 1 1 0
4 Thunderstorm 0 0 0 1
5 Fog-Thunderstorm 1 0 0 1
You can easily drop the original column if you wish.
Upvotes: 3
Reputation: 4375
Since, Events have partial words you cannot use get_dummes
if you use it will create a column for all possible combinations. Use str.contains()
to find match and create columns.
I used 0
for true and -1
for false, but you could interchange that
df
Out[48]:
id Events
0 0 Rain
1 1 Rain
2 8 Fog-Rain
3 9 Rain-Snow
4 32 Thunderstorm
5 31 Fog
6 23 Snow
df.Events.str.contains("Rain")
Out[49]:
0 True
1 True
2 True
3 True
4 False
5 False
6 False
Name: Events, dtype: bool
df.loc[df.Events.str.contains("Rain"), "Rain"] = 0
df
Out[51]:
id Events Rain
0 0 Rain 0
1 1 Rain 0
2 8 Fog-Rain 0
3 9 Rain-Snow 0
4 32 Thunderstorm NaN
5 31 Fog NaN
6 23 Snow NaN
df.loc[df.Events.str.contains("Snow"), "Snow"] = 0
df
Out[53]:
id Events Rain Snow
0 0 Rain 0 NaN
1 1 Rain 0 NaN
2 8 Fog-Rain 0 NaN
3 9 Rain-Snow 0 0
4 32 Thunderstorm NaN NaN
5 31 Fog NaN NaN
6 23 Snow NaN 0
df.loc[df.Events.str.contains("Thunderstorm"), "Thunderstorm"] = 0
df
Out[55]:
id Events Rain Snow Thunderstorm
0 0 Rain 0 NaN NaN
1 1 Rain 0 NaN NaN
2 8 Fog-Rain 0 NaN NaN
3 9 Rain-Snow 0 0 NaN
4 32 Thunderstorm NaN NaN 0
5 31 Fog NaN NaN NaN
6 23 Snow NaN 0 NaN
df.loc[df.Events.str.contains("Fog"), "Fog"] = 0
df
Out[57]:
id Events Rain Snow Thunderstorm Fog
0 0 Rain 0 NaN NaN NaN
1 1 Rain 0 NaN NaN NaN
2 8 Fog-Rain 0 NaN NaN 0
3 9 Rain-Snow 0 0 NaN NaN
4 32 Thunderstorm NaN NaN 0 NaN
5 31 Fog NaN NaN NaN 0
6 23 Snow NaN 0 NaN NaN
df = df.fillna(-1)
df
Out[59]:
id Events Rain Snow Thunderstorm Fog
0 0 Rain 0 -1 -1 -1
1 1 Rain 0 -1 -1 -1
2 8 Fog-Rain 0 -1 -1 0
3 9 Rain-Snow 0 0 -1 -1
4 32 Thunderstorm -1 -1 0 -1
5 31 Fog -1 -1 -1 0
6 23 Snow -1 0 -1 -1
Upvotes: 1