Reputation: 31
Suppose I have the data frame
df = pd.DataFrame({'a': [1,1,1,1, 0,0,0],
'b': [1,1,1,1, 0,0,0],
'c': [1,1,1,1, 0,0,0],
'd': [0,0,0,0, 1,1,1],
'e': [0,0,0,0, 1,1,1],
'f': [0,0,0,0, 1,1,1]})
or
a b c d e f
0 1 1 1 0 0 0
1 1 1 1 0 0 0
2 1 1 1 0 0 0
3 1 1 1 0 0 0
4 0 0 0 1 1 1
5 0 0 0 1 1 1
6 0 0 0 1 1 1
Is there an efficient way to collapse all columns that have the same values into a data frame that looks like
final_df = pd.DataFrame({'a/b/c': [1,1,1,1, 0,0,0],
'd/e/f': [0,0,0,0, 1,1,1]})
or
a/b/c d/e/f
0 1 0
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
Upvotes: 2
Views: 61
Reputation: 23227
You can use:
(df.T.groupby(df.index.tolist())
.agg(lambda x: '/'.join(x.index))
.reset_index(name='col')
.set_index('col')
.T
.sort_index(axis=1)
.rename_axis(columns=None)
)
Result:
a/b/c d/e/f
0 1 0
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
Upvotes: 2
Reputation: 153510
This is fun:
df.apply(
lambda s: s.reset_index(name="val")
.groupby("val")["index"]
.agg("/".join)
.reset_index()
.set_index("index")
.squeeze(),
axis=1,
)
Output:
index a/b/c d/e/f
0 1 0
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
Upvotes: 4
Reputation: 142
You can just keep only one among the same ones and delete the rest
df = pd.DataFrame({'a': [1,1,1,1, 0,0,0],
'b': [1,1,1,1, 0,0,0],
'c': [1,1,1,1, 0,0,0],
'd': [0,0,0,0, 1,1,1],
'e': [0,0,0,0, 1,1,1],
'f': [0,0,0,0, 1,1,1]})
# keeping only 'a' and 'd'
df.drop(['b','c','e','f'], axis = 1, inplace = True)
Now you'll only have 2 columns, 'a' and 'd'
Upvotes: -1