Linus Ang
Linus Ang

Reputation: 21

Pandas: Extract the row before and the row after based on a given value

I just started getting into pandas. I have searched through many sources and could not find a solution to my problem. Hope to learn from the specialists here.

This is the original dataframe:

Country Sales Item_A Item_B
UK 28 20 30
Asia 75 15 20
USA 100 30 40

Assume that the Sales column is always sorted in ascending order from lowest to highest.

Let say given Sales = 50 and Country = 'UK', how do I

  1. Identify the two rows that have the closest Sales value w.r.t. 50?
  2. Insert a new row between the two rows with the given Country and Sales?
  3. Interpolate the values for Item_A and Item_B?

This is the expected result:

Country Sales Item_A Item_B
UK 28 20 30
UK 50 17.7 25.3
Asia 75 15 20

Upvotes: 2

Views: 1516

Answers (1)

hteza
hteza

Reputation: 321

First, I would recommend you to just add the new row at the bottom and sort the column so that it would go to your preferred postion.

new = {'Country': ['UK'], 'Sales': [50]}
df = pd.concat([df, pd.DataFrame(new)]).sort_values(by=["Sales"]).reset_index(drop=True)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50     NaN     NaN
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

The second line will add the new line (concat), then sort your concerned column (sort_values) and the row will move to the preferred index (reset_index).

But if you have your reasons of adding directly to the index, I am not aware of pandas insert for rows, only columns. So, my recommendation would be to rip the original dataframe into before and after rows. To do so, you would need to find the index to put your new row.

def check_index(value):
    ruler = sorted(df["Sales"])
    ruled = [i for i in range(len(ruler)) if ruler[i] < 50]
    return max(ruled)+1

This function will sort the concerned column of the original dataframe, compare the value and get the index your new row should go.

df = pd.concat([df[: check_index(new["Sales"])], pd.DataFrame(new), df[check_index(new["Sales"]):]]).reset_index(drop=True)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50     NaN     NaN
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

This will rip your dataframe, and concat before, new row, then after dataframe. For your second part of the request, you can apply the same funtion directly by naming the columns, but here I make sure to select the numeric columns first since we are going to do arithmetics on this. We use shift to select the previous and subsequent values then half the value.

for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
    df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50    17.5    25.0
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

But please be noted that if the new row is going to the first row of the dataframe, the value will be still Na since it does not have a before row to calculate with. For that, I added a second new fillna function, you can replace with the value/calculation of your choice.

  Country  Sales  Item_A  Item_B
0      UK     10     NaN     NaN
1      UK     28    20.0    30.0
2      UK     50     NaN     NaN
3    Asia     75    15.0    20.0
4     USA    100    30.0    40.0

for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
    df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)
    df[col] = df[col].fillna(df[col].shift(-1)/2) #this

  Country  Sales  Item_A  Item_B
0      UK     10    10.0    15.0
1      UK     28    20.0    30.0
2      UK     50    17.5    25.0
3    Asia     75    15.0    20.0
4     USA    100    30.0    40.0

Upvotes: 1

Related Questions