Reputation: 8366
This is an example of my df:
pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"]],
columns=["a", "b"])
a b
0 1 2
1 1 2
2 3 other_value
And I want to arrive to this:
pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"], ["3", "row_duplicated_with_edits_in_this_column"]],
columns=["a", "b"])
a b
0 1 2
1 1 2
2 3 other_value
3 3 row_duplicated_with_edits_in_this_column
The rule is to use the apply method, do some checks (to keep the example simple I'm not including these checks), but under certain conditions, for some rows in the apply function, duplicate the row, make an edit to the row and insert both rows in the df.
So something like:
def f(row):
if condition:
row["a"] = 3
elif condition:
row["a"] = 4
elif condition:
row_duplicated = row.copy()
row_duplicated["a"] = 5 # I need also this row to be included in the df
return row
df.apply(f, axis=1)
I don't want to store the duplicated rows somewhere in my class and add them at the end. I want to do it on the fly.
I've seen this pandas: apply function to DataFrame that can return multiple rows but I'm unsure if groupby can help me here.
Thanks
Upvotes: 6
Views: 4282
Reputation: 164623
Your logic does seem mostly vectorisable. Since the order of rows in your output appears to be important, you can increment the default RangeIndex
by 0.5 and then use sort_index
.
def row_appends(x):
newrows = x.loc[x['a'].isin(['3', '4', '5'])].copy()
newrows.loc[x['a'] == '3', 'b'] = 10 # make conditional edit
newrows.loc[x['a'] == '4', 'b'] = 20 # make conditional edit
newrows.index = newrows.index + 0.5
return newrows
res = pd.concat([df, df.pipe(row_appends)])\
.sort_index().reset_index(drop=True)
print(res)
a b
0 1 2
1 1 2
2 3 other_value
3 3 10
Upvotes: 2
Reputation: 402303
Here is one way using df.iterrows
inside a list comprehension. You will need to append your rows to a loop and then concat.
def func(row):
if row['a'] == "3":
row2 = row.copy()
# make edits to row2
return pd.concat([row, row2], axis=1)
return row
pd.concat([func(row) for _, row in df.iterrows()], ignore_index=True, axis=1).T
a b
0 1 2
1 1 2
2 3 other_value
3 3 other_value
I've found that in my case it is better without ignore_index=True
because I later on merge 2 dfs.
Upvotes: 3
Reputation: 1403
I would vectorise it, doing it category by category:
df[df_condition_1]["a"] = 3
df[df_condition_2]["a"] = 4
duplicates = df[df_condition_3] # somehow we store it ?
duplicates["a"] = 5
#then
df.join(duplicates, how='outer')
Does this solution fit you needs?
Upvotes: 1