Reputation: 1482
If I have a dataframe with only two datatypes like below:
d = {'col1': [1, 2], 'col2': ['jack', 'bill'], 'col3': [4, 5], 'col4': ['megan', 'sarah']}
df = pd.DataFrame(data=d)
print(df)
col1 col2 col3 col4
0 1 jack 4 megan
1 2 bill 5 sarah
print(df.dtypes)
col1 int64
col2 object
col3 int64
col4 object
Is there a way to stack these columns based only on data type? The end result would be:
col1 col2
0 1 jack
1 2 bill
2 4 megan
3 5 sarah
It's not necessary for the final column names to remain the same.
Upvotes: 2
Views: 94
Reputation: 59304
For mismatch in number of dtype columns, you may use the default constructor. Borrowing Quang's idea on groupby(axis=1)
,
pd.DataFrame(df.groupby(df.dtypes, axis=1).apply(lambda s: list(s.values.ravel())).tolist()).T
Upvotes: 2
Reputation: 323396
Why not give a chance for for loop
pd.DataFrame([ df.loc[:,df.dtypes==x].values.ravel() for x in df.dtypes.unique()]).T
Out[46]:
0 1
0 1 jack
1 4 megan
2 2 bill
3 5 sarah
Upvotes: 3
Reputation: 150825
This works with your sample data, not sure if it works with general data
(df.groupby(df.dtypes, axis=1)
.apply(lambda x: (x.stack().reset_index(drop=True)))
)
Output
int64 object
0 1 jack
1 4 megan
2 2 bill
3 5 sarah
Upvotes: 4