Nivel
Nivel

Reputation: 679

Method chaining with pandas function

Why can't I chain the get_dummies() function?

import pandas as pd

df = (pd
     .read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
     .drop(columns=['sepal_length'])
     .get_dummies()
)

This works fine:

df = (pd
     .read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
     .drop(columns=['sepal_length'])
)
df = pd.get_dummies(df)

Upvotes: 0

Views: 2036

Answers (2)

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

You can't chain the pd.get_dummies() method since it is not a pd.DataFrame method. However, assuming -

  1. You have a single column left after you drop your columns in the previous step in the chain.
  2. Your column is a string column dtype.

... you can use pd.Series.str.get_dummies() which is a series level method.

### Dummy Dataframe
#       A  B
#    0  1  x
#    1  2  y
#    2  3  z

pd.read_csv(path).drop(columns=['A'])['B'].str.get_dummies()
   x  y  z
0  1  0  0
1  0  1  0
2  0  0  1

NOTE: Make sure that before you call the get_dummies() method, the data type of the object is series. In this case, I fetch column ['B'] to do that, which kinda makes the previous pd.DataFrame.drop() method unnecessary and useless :)

But this is only for example's sake.

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35626

DataFrame.pipe can be helpful in chaining methods or function calls which are not natively attached to the DataFrame, like pd.get_dummies:

df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies)

Or with lambda:

df = (
    df.drop(columns=['sepal_length'])
        .pipe(lambda current_df: pd.get_dummies(current_df))
)

Sample DataFrame:

df = pd.DataFrame({'sepal_length': 1, 'a': list('ABACC'), 'b': list('ACCAB')})

df:

   sepal_length  a  b
0             1  A  A
1             1  B  C
2             1  A  C
3             1  C  A
4             1  C  B

Sample Output:

df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies)

df:

   a_A  a_B  a_C  b_A  b_B  b_C
0    1    0    0    1    0    0
1    0    1    0    0    0    1
2    1    0    0    0    0    1
3    0    0    1    1    0    0
4    0    0    1    0    1    0

Upvotes: 3

Related Questions