Aiha
Aiha

Reputation: 51

Pandas stack() if columns have a specific value

I am trying to stack this table based on ID column but only considering columns [A-D] where the value is 1 and not 0.

Current df:

ID A B C D
1 1 0 0 1
3 0 1 0 1
7 1 0 1 1
8 1 0 0 0

What I want:

ID LETTER
1 A
1 D
3 B
3 D
7 A
7 C
7 D
8 A

The following code works but I need a more efficient solution as I have a df with 93434 rows x 12377 columns.

stacked_df = df.set_index('ID').stack().reset_index(name='has_letter').rename(columns={'level_1':'LETTER'})
stacked_df = stacked_df[stacked_df['has_letter']==1].reset_index(drop=True)
stacked_df.drop(['has_letter'], axis=1, inplace=True)

Upvotes: 1

Views: 274

Answers (3)

cottontail
cottontail

Reputation: 23499

If the column labels are just letters, then you can use .dot() product to multiply the column labels with the dummy values and expand the resulting strings to for the desired result.

(
  df.set_index('ID')
  .dot(df.columns[1:])        # use inner product of column names and values
  .apply(list)                # separate each letter
  .explode()                  # explode each list
  .reset_index(name='LETTER') # reset index for df
)

However, if the column labels are words (e.g. instead of "A", it is "col1"), then try the following approach where instead of .dot(), .mul() is used to convert the column labels into column values.

(
  df
  .set_index('ID')
  .mul(df.columns[1:])
  .stack()
  .loc[lambda x: x!=""]
  .droplevel(level=1)
  .reset_index(name="LETTERS")
)

Given the input, both methods result in the expected output:

   ID LETTER
0   1      A
1   1      D
2   3      B
3   3      D
4   7      A
5   7      C
6   7      D
7   8      A

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195643

Try:

print(
    df.set_index("ID")
    .apply(lambda x: x.index[x == 1], axis=1)
    .reset_index()
    .explode(0)
    .rename(columns={0: "LETTERS"})
)

Prints:

   ID LETTERS
0   1       A
0   1       D
1   3       B
1   3       D
2   7       A
2   7       C
2   7       D
3   8       A

Or:

x = df.set_index("ID").stack()
print(
    x[x == 1]
    .reset_index()
    .drop(columns=0)
    .rename(columns={"level_1": "LETTER"})
)

Prints:

   ID LETTER
0   1      A
1   1      D
2   3      B
3   3      D
4   7      A
5   7      C
6   7      D
7   8      A

Upvotes: 2

mozway
mozway

Reputation: 262634

You can mask the non-1 values and stack to remove the NaNs:

df2 = df.rename_axis(columns='LETTERS').set_index('ID')
stacked_df = (df2.where(df2.eq(1)).stack()
                 .reset_index().iloc[:,:2]
              )

Output:

   ID LETTERS
0   1       A
1   1       D
2   3       B
3   3       D
4   7       A
5   7       C
6   7       D
7   8       A

Upvotes: 2

Related Questions