Hiwot
Hiwot

Reputation: 588

Create a new category based on column values: Pandas

I have the following dataframe

df

ID  Col_1   Col_2  Col_3 
1     0       1     1  
2     1       0     0
3     1       1     1
4     1       1     0

I would like to check each column other than ID have 0 values. If they have write the column name under a new column Category. If there are more 0 values with the same row, dublicate the row with different category value. If there is no any 0 value in entire row then leave empty the value of Category column.

I would like to get the dataframe as follows.

ID  Col_1   Col_2  Col_3   Category
1     0       1     1         Col_1
2     1       0     0         Col_2
2     1       0     0         Col_3
3     1       1     1
4     1       1     0         Col_3

Can any one help on this?

Upvotes: 0

Views: 1670

Answers (4)

G.G
G.G

Reputation: 765

def function1(dd:pd.DataFrame):
    if(len(dd)>1):
        return dd.assign(Category=dd.Category.iat[0].to_list())
    else:
        return dd.assign(Category=dd.Category.map(lambda x:x.max()))

col1=df1.apply(lambda ss:ss.loc[ss.eq(0)].index,axis=1)
df1.assign(Category=col1).groupby("ID",as_index=False).apply(function1)

output:

   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

Upvotes: 0

Sanaz0
Sanaz0

Reputation: 1

how about this:

df['category']=df[['col1','col2','col3']]
.apply(lambda row:row[row==0].index.values, axis=1)

Output:

      col1  col2  col3      category
ID                                
0      1     0     1        [col2]
1      0     0     1  [col1, col2]
2      1     1     0        [col3]

As you can see ID the index in this example, but it doesn't have to be since you can select which columns you are working with, and then of course you can run:

df.explode('category')

Upvotes: 0

jezrael
jezrael

Reputation: 862641

If need all columns filled by 0 values use matrix multiplication dot, then use DataFrame.explode with splitted values (performance in large df should be worse):

df['Category'] = (df.iloc[:, 1:].eq(0).dot(df.columns[1:] + ',')
                     .str.strip(',').str.split(','))
                 
df = df.explode('Category')
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

EDIT:

s = df.set_index('ID').stack()
df = (df.join(s[s.eq(0)].reset_index(level=1)['level_1'].rename('Category'), on='ID')
        .fillna({'Category':''}))
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

Or:

s = df.melt('ID').query('value == 0').set_index('ID')['variable']
df = df.join(s.rename('Category'), on='ID').fillna({'Category':''})
print (df)
   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

Upvotes: 1

mozway
mozway

Reputation: 260640

One option is to mask the non-zero, stack to get rid of them, and join to add the new column, while duplicating the rows as expected:

s = (df.drop('ID', axis=1).mask(lambda d: d.ne(0)).stack()
       .reset_index(level=1)['level_1'].rename('Category'))

df2 = df.join(s)

output:

   ID  Col_1  Col_2  Col_3 Category
0   1      0      1      1    Col_1
1   2      1      0      0    Col_2
1   2      1      0      0    Col_3
2   3      1      1      1         
3   4      1      1      0    Col_3

Upvotes: 1

Related Questions