buddingprogrammer
buddingprogrammer

Reputation: 29

Make a values of column into new column and check for duplicates in other column ,print status color in the value column if other column duplicated

I have a column A,B,C,D A column has value x1,x2,x3,x4,x5 create a column x1,x2,x3,x4,x5 and print 1 if B,C,D has a duplication

Please provide an answer using pyspark or python pandas

Input

A   B   C   D  status_color

X1  a   b   c   red

X2  a   a   b   green

X3  a   a   b    red

X4  a   b   c   green

Output

B   C   D   X1  X2  X3  X4

a   b   c   red 0   0   green

a   a   b   0   green   red 0

I tried to find duplicate of column and then create a column duplicate flag which prints status_color if other column are duplicated df['duplicate_flag']=df.duplicated(subset['B','C','D'])

my problem here i don't know to compare it with column A and print it in X1,X2,X3,X4

any one can help with python? i am new to python

Upvotes: 2

Views: 67

Answers (2)

mozway
mozway

Reputation: 262149

Use pandas.crosstab:

out = (pd.crosstab([df['B'], df['C'], df['D']], df['A'])
         .clip(upper=1) # only if you expect duplicates
         .reset_index().rename_axis(columns=None)
       )

output:

   B  C  D  X1  X2  X3  X4
0  a  a  b   0   1   1   0
1  a  b  c   1   0   0   1

Upvotes: 0

Dani Mesejo
Dani Mesejo

Reputation: 61920

Use groupby + str.get_dummies:

group = df.groupby(["B", "C", "D"], sort=False).agg("|".join)
res = group["A"].str.get_dummies().reset_index()
print(res)

Output

   B  C  D  X1  X2  X3  X4
0  a  a  b   0   1   1   0
1  a  b  c   1   0   0   1

Upvotes: 2

Related Questions