Reputation: 6280
I want to create a duplicate for every row, but only if the row has a new entry in a specific column and on top want to keep some columns empty then. So the dataframe looks like:
number value area typ
1 10 B A
2 20 B A
3 10 B B
4 20 B B
5 30 B B
the outcome of my expected dataframe would be:
number value area typ
B A
1 10 B A
2 20 B A
B B
3 10 B B
4 20 B B
5 30 B B
so that it gets duplicated based on a new entry of typ
(a typ which was not in the rows before) and the columns number and value are kept empty.
Upvotes: 0
Views: 69
Reputation: 150825
You can drop duplicate and concat:
pd.concat((df.drop_duplicates(['area','typ']).assign(number='',value=''), df)
).sort_index(kind='mergesort')
Output:
number value area typ
0 B A
0 1 10 B A
1 2 20 B A
2 B B
2 3 10 B B
3 4 20 B B
4 5 30 B B
Update: For several columns that needs to be emptied:
cols = ['area','typ']
new_df = df.drop_duplicates(cols)
for col in new_df.columns:
if col not in cols: new_df[col] = ''
pd.concat((new_df, df)).sort_index(kind='mergesort')
Upvotes: 3