Reputation: 2978
I have the following DataFrame df
:
d = {'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
'4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3], '6': ['100', 'AAA', 3]}
columns=['Price', 'Name', 'Class']
df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns
I want to duplicate rows based on values of the column Class
. In particular, I want to randomly select rows where Class
is equal to 3, and duplicate them. For example, in current df
I have 3 rows with Class
equal to 3. How can I create N duplicates, where N is configurable, for example:
N = 2
target_column = "Class"
target_value = 3
new_df = create_duplicates(df, target_column, target_value, N)
I was thinking to use for-loop and at each iteration (when Class
is equal to 3) generate a random number. If it's greater than 0.5, then the row is added to a list of selected rows. This process continues until a list of selected rows contains N rows. Then these N rows are appended to df
.
Is there a more elegant and shorter way to do the same? Maybe some built-in pandas functions?
Upvotes: 0
Views: 1412
Reputation: 134
I think this script below will do what you need. I lifted the repetition part from: Repeat Rows in Data Frame n Times
n=3
pd.concat([df,df[df['Class']==3].loc[df.index.repeat(n)].dropna()]).sort_values('Name')
Upvotes: 1