Reputation: 21
I just started getting into pandas. I have searched through many sources and could not find a solution to my problem. Hope to learn from the specialists here.
This is the original dataframe:
Country | Sales | Item_A | Item_B |
---|---|---|---|
UK | 28 | 20 | 30 |
Asia | 75 | 15 | 20 |
USA | 100 | 30 | 40 |
Assume that the Sales column is always sorted in ascending order from lowest to highest.
Let say given Sales = 50 and Country = 'UK', how do I
This is the expected result:
Country | Sales | Item_A | Item_B |
---|---|---|---|
UK | 28 | 20 | 30 |
UK | 50 | 17.7 | 25.3 |
Asia | 75 | 15 | 20 |
Upvotes: 2
Views: 1516
Reputation: 321
First, I would recommend you to just add the new row at the bottom and sort the column so that it would go to your preferred postion.
new = {'Country': ['UK'], 'Sales': [50]}
df = pd.concat([df, pd.DataFrame(new)]).sort_values(by=["Sales"]).reset_index(drop=True)
Country Sales Item_A Item_B
0 UK 28 20.0 30.0
1 UK 50 NaN NaN
2 Asia 75 15.0 20.0
3 USA 100 30.0 40.0
The second line will add the new line (concat), then sort your concerned column (sort_values) and the row will move to the preferred index (reset_index).
But if you have your reasons of adding directly to the index, I am not aware of pandas insert for rows, only columns. So, my recommendation would be to rip the original dataframe into before and after rows. To do so, you would need to find the index to put your new row.
def check_index(value):
ruler = sorted(df["Sales"])
ruled = [i for i in range(len(ruler)) if ruler[i] < 50]
return max(ruled)+1
This function will sort the concerned column of the original dataframe, compare the value and get the index your new row should go.
df = pd.concat([df[: check_index(new["Sales"])], pd.DataFrame(new), df[check_index(new["Sales"]):]]).reset_index(drop=True)
Country Sales Item_A Item_B
0 UK 28 20.0 30.0
1 UK 50 NaN NaN
2 Asia 75 15.0 20.0
3 USA 100 30.0 40.0
This will rip your dataframe, and concat before, new row, then after dataframe. For your second part of the request, you can apply the same funtion directly by naming the columns, but here I make sure to select the numeric columns first since we are going to do arithmetics on this. We use shift to select the previous and subsequent values then half the value.
for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)
Country Sales Item_A Item_B
0 UK 28 20.0 30.0
1 UK 50 17.5 25.0
2 Asia 75 15.0 20.0
3 USA 100 30.0 40.0
But please be noted that if the new row is going to the first row of the dataframe, the value will be still Na since it does not have a before row to calculate with. For that, I added a second new fillna function, you can replace with the value/calculation of your choice.
Country Sales Item_A Item_B
0 UK 10 NaN NaN
1 UK 28 20.0 30.0
2 UK 50 NaN NaN
3 Asia 75 15.0 20.0
4 USA 100 30.0 40.0
for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)
df[col] = df[col].fillna(df[col].shift(-1)/2) #this
Country Sales Item_A Item_B
0 UK 10 10.0 15.0
1 UK 28 20.0 30.0
2 UK 50 17.5 25.0
3 Asia 75 15.0 20.0
4 USA 100 30.0 40.0
Upvotes: 1