Claudiu Creanga
Claudiu Creanga

Reputation: 8366

In pandas apply method, duplicate the row based on condition

This is an example of my df:

pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"]],
                     columns=["a", "b"])
    a   b
0   1   2
1   1   2
2   3   other_value

And I want to arrive to this:

pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"], ["3", "row_duplicated_with_edits_in_this_column"]],
                     columns=["a", "b"])
    a   b
0   1   2
1   1   2
2   3   other_value
3   3   row_duplicated_with_edits_in_this_column

The rule is to use the apply method, do some checks (to keep the example simple I'm not including these checks), but under certain conditions, for some rows in the apply function, duplicate the row, make an edit to the row and insert both rows in the df.

So something like:

def f(row):
   if condition:
      row["a"] = 3
   elif condition:
      row["a"] = 4
   elif condition:
      row_duplicated = row.copy()
      row_duplicated["a"] = 5 # I need also this row to be included in the df

   return row
df.apply(f, axis=1)

I don't want to store the duplicated rows somewhere in my class and add them at the end. I want to do it on the fly.

I've seen this pandas: apply function to DataFrame that can return multiple rows but I'm unsure if groupby can help me here.

Thanks

Upvotes: 6

Views: 4282

Answers (3)

jpp
jpp

Reputation: 164623

Your logic does seem mostly vectorisable. Since the order of rows in your output appears to be important, you can increment the default RangeIndex by 0.5 and then use sort_index.

def row_appends(x):
    newrows = x.loc[x['a'].isin(['3', '4', '5'])].copy()
    newrows.loc[x['a'] == '3', 'b'] = 10  # make conditional edit
    newrows.loc[x['a'] == '4', 'b'] = 20  # make conditional edit
    newrows.index = newrows.index + 0.5
    return newrows

res = pd.concat([df, df.pipe(row_appends)])\
        .sort_index().reset_index(drop=True)

print(res)

   a            b
0  1            2
1  1            2
2  3  other_value
3  3           10

Upvotes: 2

cs95
cs95

Reputation: 402303

Here is one way using df.iterrows inside a list comprehension. You will need to append your rows to a loop and then concat.

def func(row):
   if row['a'] == "3":
        row2 = row.copy()
        # make edits to row2
        return pd.concat([row, row2], axis=1)
   return row

pd.concat([func(row) for _, row in df.iterrows()], ignore_index=True, axis=1).T

   a            b
0  1            2
1  1            2
2  3  other_value
3  3  other_value

I've found that in my case it is better without ignore_index=True because I later on merge 2 dfs.

Upvotes: 3

Ludo Schmidt
Ludo Schmidt

Reputation: 1403

I would vectorise it, doing it category by category:

df[df_condition_1]["a"] = 3
df[df_condition_2]["a"] = 4

duplicates = df[df_condition_3] # somehow we store it ?     
duplicates["a"] = 5 

#then 
df.join(duplicates, how='outer')

Does this solution fit you needs?

Upvotes: 1

Related Questions