Mactilda
Mactilda

Reputation: 393

One hot encoding - dummies - in several columns and then concating with original df with pandas

I have a df with several nominal categorical columns that I would want to create dummies for. Here's a mock df:

data = {'Frukt':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Vikt':[23, 45, 31, 28, 62, 12, 44, 42, 23, 32], 
        'Färg':['grön', 'gul', 'röd', 'grön', 'grön', 'gul', 'röd', 'röd', 'gul', 'grön'], 
        'Smak':['god', 'sådär', 'supergod', 'rälig', 'rälig', 'supergod', 'god', 'god', 'rälig', 'god']} 

df = pd.DataFrame(data) 

I have tried naming the columns I want to get dummies from:

nomcols = ['Färg', 'Smak']

for column in ['nomcols']:
    dummies = pd.get_dummies(df[column])

df[dummies.columns] = dummies

which was a tip I got from another question that I found, but it didn't work. I have looked at the other four questions that are similar but haven't had any luck since most of them get dummies from ALL the columns in the df.

What I would like is something like this:

enter image description here

Upvotes: 0

Views: 50

Answers (3)

Manoj Ravi
Manoj Ravi

Reputation: 103

nomcols = ['Färg', 'Smak']

for column in nomcols:
    dummies = pd.get_dummies(df[column])

The above code should work.

Upvotes: 0

jezrael
jezrael

Reputation: 863781

Use get_dummies with specify columns in list, then remove separator by columns names with prefix seting to empty string:

nomcols = ['Färg', 'Smak']

df = pd.get_dummies(df, columns=nomcols, prefix='', prefix_sep='')
print (df)
   Frukt  Vikt  grön  gul  röd  god  rälig  supergod  sådär
0      1    23     1    0    0    1      0         0      0
1      2    45     0    1    0    0      0         0      1
2      3    31     0    0    1    0      0         1      0
3      4    28     1    0    0    0      1         0      0
4      5    62     1    0    0    0      1         0      0
5      6    12     0    1    0    0      0         1      0
6      7    44     0    0    1    1      0         0      0
7      8    42     0    0    1    1      0         0      0
8      9    23     0    1    0    0      1         0      0
9     10    32     1    0    0    1      0         0      0

Upvotes: 1

ec2604
ec2604

Reputation: 521

What you did was more or less correct. But you did:

for column in ['nomcols']:
    dummies = pd.get_dummies(df[column])

So you're trying to access df at 'nomcols'. What you wanted to do was:

dummies = pd.get_dummies(df[nomcols])

You want to access the dataframe at the column names inside the nomcols list.

Upvotes: 0

Related Questions