Reputation: 1781
Let's say, I have the below data frame as sample.
name age status price
0 frank 12 1 100
1 jack 33 0 190
2 joe 22 1 200
******************************
Desired output:
name age status price
0 frank 12 1 100
1 jack 33 0 190
2 jack NaN 0 -190
3 joe 22 1 200
******************************
I also posted the sample data frame below, so you can test it easily.
df1 = pd.DataFrame({
"name":["frank", "jack", "joe"],
"age": [12, 33, 22],
"status": [1,0, 1],
"price": [100,190, 200]
})
As you can see, I want to insert a new row based on the above row, which status
is 0, it's meant to be a transaction failure. to statistics more easily, I want to generate a new row below that one. I also want its price
to be negative number. and since I don't care about whole column of the new row, so I want the other column be NaN
, just like 'age' here in my desired output.
What I've tried so far.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
"name":["frank", "jack", "joe"],
"age": [12, 33, 22],
"status": [1,0, 1],
"price": [100,190, 200]
})
df2_list = []
for i, row in df1.iterrows():
if row["status"] == 0:
origin_row = row.to_dict()
new_row = ({
"name": origin_row.get("name"),
#"age": origin_row.get("age"),
"age": np.NaN,
"status": origin_row.get("status"),
"price": -origin_row.get("price"),
})
df2_list.append(new_row)
df2 = pd.DataFrame(df2_list)
# concat df1 and df2 and sort it .
df3 = pd.concat([df1, df2], ignore_index=True)
df4 = df3.sort_values(['name', 'price'], ascending=[True, False])
print(df4)
I have a loop, and check if it hit my condition status==0
, and append it on my tmp list, and ...but it's too many code. I want to know is there any good way, I mean more Pythonic code or pandas has already got some function can it ?
Upvotes: 4
Views: 2725
Reputation: 403128
Use numpy.repeat
to add rows, and Series.duplicated
to set the price.
df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns)
df2.loc[df2.name.duplicated(), 'price'] *= -1
df2
name age status price
0 frank 12 1 100
1 jack 33 0 190
2 jack 33 0 -190
3 joe 22 1 200
If you need to mask NaNs in the age column as well, you can do that with Series.mask
.
df2.age.mask(df2.name.duplicated())
0 12
1 33
2 NaN
3 22
Name: age, dtype: object
Full code.
df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns)
isdup = df2.name.duplicated()
df2.loc[isdup, 'price'] *= -1
df2['age'] = df2['age'].mask(isdup)
df2
name age status price
0 frank 12 1 100
1 jack 33 0 190
2 jack NaN 0 -190
3 joe 22 1 200
Upvotes: 3