baxx
baxx

Reputation: 4725

Group columns in pandas dataframe and reduce amount

Not too sure how to phrase this, but I can give a pretty clear example.

Given the following data:

df = pd.DataFrame(
    {"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 1], "d": [0, 0, 1], "e": [1, 0, 0]}
)

which looks as

   a  b  c  d  e
0  1  0  1  0  1
1  0  1  0  0  0
2  0  0  1  1  0

I would like to convert it to

df_r = pd.DataFrame({"up": [1, 1, 0], "down": [1, 0, 1], "neither": [1, 0, 0]})

which looks as

   up  down  neither
0   1     1        1
1   1     0        0
2   0     1        0

The solution should not depend on the shape of the above data - either the shape of the input or the output. It should be able to take a set of columns and, according to some logic, process them into a different dataframe where the meanings are represented without duplication.

Which columns are which are known already, its coding the logic which isn't clear.

Here's some pseudo code

upcols = ['a', 'b']
downcols = ['c','d']


if original_column in upcols
    new_column = up
elif original_column in downcols
    new_column = down
else
    new_column = neither

<remove duplicate columns>

Upvotes: 1

Views: 51

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195553

If I understand you correctly, you can use any(axis=1) on columns specified in upcols and downcols:

df = pd.DataFrame(
    {"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 1], "d": [0, 0, 1], "e": [1, 0, 0]}
)

upcols = ['a', 'b']
downcols = ['c','d']

df['neither'] = df[[c for c in df.columns if c not in upcols and c not in downcols]].any(axis=1).astype(int)
df['up'] = df[upcols].any(axis=1).astype(int)
df['down'] = df[downcols].any(axis=1).astype(int)

print(df)

Prints:

   a  b  c  d  e  neither  up  down
0  1  0  1  0  1        1   1     1
1  0  1  0  0  0        0   1     0
2  0  0  1  1  0        0   0     1

To get only up, down, neither then you can do additionally:

df = df[['up', 'down', 'neither']]
print(df)

Prints:

   up  down  neither
0   1     1        1
1   1     0        0
2   0     1        0

Upvotes: 3

Related Questions