Reputation: 4725
Not too sure how to phrase this, but I can give a pretty clear example.
Given the following data:
df = pd.DataFrame(
{"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 1], "d": [0, 0, 1], "e": [1, 0, 0]}
)
which looks as
a b c d e
0 1 0 1 0 1
1 0 1 0 0 0
2 0 0 1 1 0
I would like to convert it to
df_r = pd.DataFrame({"up": [1, 1, 0], "down": [1, 0, 1], "neither": [1, 0, 0]})
which looks as
up down neither
0 1 1 1
1 1 0 0
2 0 1 0
The solution should not depend on the shape of the above data - either the shape of the input or the output. It should be able to take a set of columns and, according to some logic, process them into a different dataframe where the meanings are represented without duplication.
Which columns are which are known already, its coding the logic which isn't clear.
Here's some pseudo code
upcols = ['a', 'b']
downcols = ['c','d']
if original_column in upcols
new_column = up
elif original_column in downcols
new_column = down
else
new_column = neither
<remove duplicate columns>
Upvotes: 1
Views: 51
Reputation: 195553
If I understand you correctly, you can use any(axis=1)
on columns specified in upcols
and downcols
:
df = pd.DataFrame(
{"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 1], "d": [0, 0, 1], "e": [1, 0, 0]}
)
upcols = ['a', 'b']
downcols = ['c','d']
df['neither'] = df[[c for c in df.columns if c not in upcols and c not in downcols]].any(axis=1).astype(int)
df['up'] = df[upcols].any(axis=1).astype(int)
df['down'] = df[downcols].any(axis=1).astype(int)
print(df)
Prints:
a b c d e neither up down
0 1 0 1 0 1 1 1 1
1 0 1 0 0 0 0 1 0
2 0 0 1 1 0 0 0 1
To get only up
, down
, neither
then you can do additionally:
df = df[['up', 'down', 'neither']]
print(df)
Prints:
up down neither
0 1 1 1
1 1 0 0
2 0 1 0
Upvotes: 3