user14946571
user14946571

Reputation: 31

Pandas. Combine all columns in a dataframe that have the same values into single columns

Suppose I have the data frame

df = pd.DataFrame({'a': [1,1,1,1, 0,0,0],
                   'b': [1,1,1,1, 0,0,0],
                   'c': [1,1,1,1, 0,0,0],
                   'd': [0,0,0,0, 1,1,1],
                   'e': [0,0,0,0, 1,1,1],
                   'f': [0,0,0,0, 1,1,1]})

or

   a  b  c  d  e  f
0  1  1  1  0  0  0
1  1  1  1  0  0  0
2  1  1  1  0  0  0
3  1  1  1  0  0  0
4  0  0  0  1  1  1
5  0  0  0  1  1  1
6  0  0  0  1  1  1

Is there an efficient way to collapse all columns that have the same values into a data frame that looks like

final_df = pd.DataFrame({'a/b/c': [1,1,1,1, 0,0,0],
                         'd/e/f': [0,0,0,0, 1,1,1]})

or

   a/b/c  d/e/f
0      1      0
1      1      0
2      1      0
3      1      0
4      0      1
5      0      1
6      0      1

Upvotes: 2

Views: 61

Answers (3)

SeaBean
SeaBean

Reputation: 23227

You can use:

(df.T.groupby(df.index.tolist())
   .agg(lambda x: '/'.join(x.index))
   .reset_index(name='col')
   .set_index('col')
   .T
   .sort_index(axis=1)
   .rename_axis(columns=None)
)

Result:

   a/b/c  d/e/f
0      1      0
1      1      0
2      1      0
3      1      0
4      0      1
5      0      1
6      0      1

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153510

This is fun:

df.apply(
    lambda s: s.reset_index(name="val")
    .groupby("val")["index"]
    .agg("/".join)
    .reset_index()
    .set_index("index")
    .squeeze(),
    axis=1,
)

Output:

index  a/b/c  d/e/f
0          1      0
1          1      0
2          1      0
3          1      0
4          0      1
5          0      1
6          0      1

Upvotes: 4

Bibekjit Singh
Bibekjit Singh

Reputation: 142

You can just keep only one among the same ones and delete the rest

df = pd.DataFrame({'a': [1,1,1,1, 0,0,0],
               'b': [1,1,1,1, 0,0,0],
               'c': [1,1,1,1, 0,0,0],
               'd': [0,0,0,0, 1,1,1],
               'e': [0,0,0,0, 1,1,1],
               'f': [0,0,0,0, 1,1,1]})

# keeping only 'a' and 'd'

df.drop(['b','c','e','f'], axis = 1, inplace = True)

Now you'll only have 2 columns, 'a' and 'd'

Upvotes: -1

Related Questions