Reputation: 594
I'm one-hot encoding some categorical variables with some code that was provied to me. This line adds a column of 0s and 1s with a name with the format prefix_categoricalValue
dataframe = pandas.concat([dataframe,pandas.get_dummies(dataframe[0], prefix='protocol')],axis=1).drop([0],axis=1)
I want the column to have as a name its index, not prefix_categoricalValue
.
I know that I can do something like df.rename(columns={'prefix_categoricalValue': '0'}, inplace=True)
, but I'm not sure how to do it for all the columns which have that prefix.
This is an example of a part of the dataframe. Whether I decide to leave the local_address prefix or not, each category will have its name. Is it possible to rename the column with its index?
EDIT:
I'm trying to do this:
for column in dataframe:
dataframe.rename(columns={column: 'new_name'}, inplace=True)
print (column)
but I'm not exactly sure why it doesn't work
Upvotes: 1
Views: 3772
Reputation: 446
Let's say your prefix is local_address_0.0.0.0
. The following code renames the columns that start with the prefix you specify to the index that column has according to the order in which they appear in the dataframe:
prefix = 'local_address_0.0.0.0'
cols = list(dataframe)
for idx, val in enumerate(cols):
if val.startswith(prefix):
dataframe.rename(index=str, columns={val: idx}, inplace=True)
This will show a warning in the console:
python3.6/site-packages/pandas/core/frame.py:3027: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
return super(DataFrame, self).rename(**kwargs)
But it is just a warning, the column names of the dataframe are updated. If you want to learn more about the warning, see How to deal with SettingWithCopyWarning in Pandas?
If someone knows how to do the same thing without a warning, please comment.
Upvotes: 1
Reputation: 3744
import pandas as pd
# 'dataframe' is the name of your data frame in the question, so that's what I use
# in my code below, although I suggest using 'data' or something for it instead,
# as 'DataFrame' is a keyword and its easy to make confusion. But anyway...
features = ['list of column names you want one-hot encoded']
# for example, features = ['Cars', 'Model, 'Year', ... ]
for f in features:
df = dataframe[[f]]
df2 = (pd.get_dummies(df, prefix='', prefix_sep='')
.max(level=0, axis=1)
.add_prefix(f+' - '))
# the new feature names will be "<old_feature_name> - <categorical_value>"
# for example, "Cars" will get transformed to "Cars - Minivan", "Cars - Truck", etc
# add the new one-hot encoded column to the dataframe
dataframe = pd.concat([dataframe, df2], axis=1)
# you can remove the original columns, if you don't need them anymore (optional)
dataframe = dataframe.drop([f], axis=1)
Upvotes: 3
Reputation: 323356
IIUC
dummydf=pd.get_dummies(df.A)
dummydf.columns=['A']*dummydf.shape[1]
dummydf
Out[1171]:
A A
0 1 0
1 0 1
2 1 0
df
Out[1172]:
A B C
0 a b 1
1 b a 2
2 a c 3
Upvotes: 0