Reputation: 679
Why can't I chain the get_dummies() function?
import pandas as pd
df = (pd
.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
.drop(columns=['sepal_length'])
.get_dummies()
)
This works fine:
df = (pd
.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
.drop(columns=['sepal_length'])
)
df = pd.get_dummies(df)
Upvotes: 0
Views: 2036
Reputation: 19322
You can't chain the pd.get_dummies()
method since it is not a pd.DataFrame
method. However, assuming -
... you can use pd.Series.str.get_dummies()
which is a series level method.
### Dummy Dataframe
# A B
# 0 1 x
# 1 2 y
# 2 3 z
pd.read_csv(path).drop(columns=['A'])['B'].str.get_dummies()
x y z
0 1 0 0
1 0 1 0
2 0 0 1
NOTE: Make sure that before you call the get_dummies() method, the data type of the object is series. In this case, I fetch column
['B']
to do that, which kinda makes the previouspd.DataFrame.drop()
method unnecessary and useless :)But this is only for example's sake.
Upvotes: 0
Reputation: 35626
DataFrame.pipe
can be helpful in chaining methods or function calls which are not natively attached to the DataFrame, like pd.get_dummies
:
df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies)
Or with lambda
:
df = (
df.drop(columns=['sepal_length'])
.pipe(lambda current_df: pd.get_dummies(current_df))
)
Sample DataFrame:
df = pd.DataFrame({'sepal_length': 1, 'a': list('ABACC'), 'b': list('ACCAB')})
df
:
sepal_length a b
0 1 A A
1 1 B C
2 1 A C
3 1 C A
4 1 C B
Sample Output:
df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies)
df
:
a_A a_B a_C b_A b_B b_C
0 1 0 0 1 0 0
1 0 1 0 0 0 1
2 1 0 0 0 0 1
3 0 0 1 1 0 0
4 0 0 1 0 1 0
Upvotes: 3