Reputation: 29
I have a column A,B,C,D A column has value x1,x2,x3,x4,x5 create a column x1,x2,x3,x4,x5 and print 1 if B,C,D has a duplication
Please provide an answer using pyspark or python pandas
Input
A B C D status_color
X1 a b c red
X2 a a b green
X3 a a b red
X4 a b c green
Output
B C D X1 X2 X3 X4
a b c red 0 0 green
a a b 0 green red 0
I tried to find duplicate of column and then create a column duplicate flag which prints status_color if other column are duplicated df['duplicate_flag']=df.duplicated(subset['B','C','D'])
my problem here i don't know to compare it with column A and print it in X1,X2,X3,X4
any one can help with python? i am new to python
Upvotes: 2
Views: 67
Reputation: 262149
Use pandas.crosstab
:
out = (pd.crosstab([df['B'], df['C'], df['D']], df['A'])
.clip(upper=1) # only if you expect duplicates
.reset_index().rename_axis(columns=None)
)
output:
B C D X1 X2 X3 X4
0 a a b 0 1 1 0
1 a b c 1 0 0 1
Upvotes: 0
Reputation: 61920
Use groupby
+ str.get_dummies
:
group = df.groupby(["B", "C", "D"], sort=False).agg("|".join)
res = group["A"].str.get_dummies().reset_index()
print(res)
Output
B C D X1 X2 X3 X4
0 a a b 0 1 1 0
1 a b c 1 0 0 1
Upvotes: 2