Reputation: 21

Pandas: Extract the row before and the row after based on a given value

I just started getting into pandas. I have searched through many sources and could not find a solution to my problem. Hope to learn from the specialists here.

This is the original dataframe:

Country	Sales	Item_A	Item_B
UK	28	20	30
Asia	75	15	20
USA	100	30	40

Assume that the Sales column is always sorted in ascending order from lowest to highest.

Let say given Sales = 50 and Country = 'UK', how do I

Identify the two rows that have the closest Sales value w.r.t. 50?
Insert a new row between the two rows with the given Country and Sales?
Interpolate the values for Item_A and Item_B?

This is the expected result:

Country	Sales	Item_A	Item_B
UK	28	20	30
UK	50	17.7	25.3
Asia	75	15	20

Upvotes: 2

Answers (1)

hteza

Reputation: 321

First, I would recommend you to just add the new row at the bottom and sort the column so that it would go to your preferred postion.

new = {'Country': ['UK'], 'Sales': [50]}
df = pd.concat([df, pd.DataFrame(new)]).sort_values(by=["Sales"]).reset_index(drop=True)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50     NaN     NaN
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

The second line will add the new line (concat), then sort your concerned column (sort_values) and the row will move to the preferred index (reset_index).

But if you have your reasons of adding directly to the index, I am not aware of pandas insert for rows, only columns. So, my recommendation would be to rip the original dataframe into before and after rows. To do so, you would need to find the index to put your new row.

def check_index(value):
    ruler = sorted(df["Sales"])
    ruled = [i for i in range(len(ruler)) if ruler[i] < 50]
    return max(ruled)+1

This function will sort the concerned column of the original dataframe, compare the value and get the index your new row should go.

df = pd.concat([df[: check_index(new["Sales"])], pd.DataFrame(new), df[check_index(new["Sales"]):]]).reset_index(drop=True)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50     NaN     NaN
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

This will rip your dataframe, and concat before, new row, then after dataframe. For your second part of the request, you can apply the same funtion directly by naming the columns, but here I make sure to select the numeric columns first since we are going to do arithmetics on this. We use shift to select the previous and subsequent values then half the value.

for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
    df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)

  Country  Sales  Item_A  Item_B
0      UK     28    20.0    30.0
1      UK     50    17.5    25.0
2    Asia     75    15.0    20.0
3     USA    100    30.0    40.0

But please be noted that if the new row is going to the first row of the dataframe, the value will be still Na since it does not have a before row to calculate with. For that, I added a second new fillna function, you can replace with the value/calculation of your choice.

  Country  Sales  Item_A  Item_B
0      UK     10     NaN     NaN
1      UK     28    20.0    30.0
2      UK     50     NaN     NaN
3    Asia     75    15.0    20.0
4     USA    100    30.0    40.0

for col in df.select_dtypes(include=['int16', 'int32', 'int64', 'float16', 'float32', 'float64']).columns.tolist():
    df[col] = df[col].fillna((df[col].shift() + df[col].shift(-1))/2)
    df[col] = df[col].fillna(df[col].shift(-1)/2) #this

  Country  Sales  Item_A  Item_B
0      UK     10    10.0    15.0
1      UK     28    20.0    30.0
2      UK     50    17.5    25.0
3    Asia     75    15.0    20.0
4     USA    100    30.0    40.0

Upvotes: 1

Pandas: Extract the row before and the row after based on a given value

Answers (1)

Related Questions