Reputation:
I have 2 data frames that I would like to return the values in a range (-1, 0, +1). One of the data frames contains Id's that i would like to look up and the other data frame contains Id's & values. For example, I want to lookup 99, 55, 117 in another data frame and return 100 99 98, 56 55 54, 118 117 116. As you can see it getting the values -1 and +1 of the Id's I would like to lookup. There is a better example below.
df = pd.DataFrame([[99],[55],[117]],columns = ['Id'])
df2 = pd.DataFrame([[100,1,2,4,5,6,8],
[87,1,6,20,22,23,34],
[99,1,12,13,34,45,46],
[64,1,10,14,29,32,33],
[55,1,22,13,23,33,35],
[66,1,6,7,8,9,10],
[77,1,2,3,5,6,8],
[811,1,2,5,6,8,10],
[118,1,7,8,22,44,56],
[117,1,66,44,47,87,91]],
columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
I would like my result to something like this below.
results = pd.DataFrame([[87,1,6,20,22,23,34],
[99,1,12,13,34,45,46],
[64,1,10,14,29,32,33],
[64,1,10,14,29,32,33],
[55,1,22,13,23,33,35],
[66,1,6,7,8,9,10],
[118,1,7,8,22,44,56],
[117,1,66,44,47,87,91]],
columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
Upvotes: 0
Views: 222
Reputation: 4823
import pandas as pd
import numpy as np
ind = df2[df2['Id'].isin(df['Id'])].index
aaa = np.array([[ind[i]-1,ind[i],ind[i]+1] for i in range(len(ind))]).ravel()
aaa = aaa[(aaa <= df2.index.values[-1]) & (aaa >= 0)]
df_test = df2.loc[aaa, :].reset_index().drop(['index'], axis=1)
print(df_test)
Output
Id Num1 Num2 Num3 Num4 Num5 Num6
0 87 1 6 20 22 23 34
1 99 1 12 13 34 45 46
2 64 1 10 14 29 32 33
3 64 1 10 14 29 32 33
4 55 1 22 13 23 33 35
5 66 1 6 7 8 9 10
6 118 1 7 8 22 44 56
7 117 1 66 44 47 87 91
Here, in the ind
list, indexes are obtained where there are the required Ids in df2.
The aaa
list creates ranges for these indexes, then the lists are wrapped in np.array
, ravel()
is used to concatenate them. Next, the list aaa
is overwritten, the elements that are greater than the maximum index df2
are removed.
Sampling occurs through loc.
Update 17.12.2022
if you need duplicate rows.
df = pd.DataFrame([[99], [55], [117], [117]], columns=['Id'])
lim_ind = df2.index[-1]
def my_func(i):
a = df2[df2['Id'].isin([i])].index.values
a = np.array([a - 1, a, a + 1]).ravel()
a = a[(a >= 0) & (a <= lim_ind)]
return a
qqq = [my_func(i) for i in df['Id']]
fff = np.array([df2.loc[qqq[i]].values for i in range(len(qqq))], dtype=object)
fff = np.vstack(fff)
result = pd.DataFrame(fff, columns=df2.columns)
print(result)
Output
Id Num1 Num2 Num3 Num4 Num5 Num6
0 87 1 6 20 22 23 34
1 99 1 12 13 34 45 46
2 64 1 10 14 29 32 33
3 64 1 10 14 29 32 33
4 55 1 22 13 23 33 35
5 66 1 6 7 8 9 10
6 118 1 7 8 22 44 56
7 117 1 66 44 47 87 91
8 118 1 7 8 22 44 56
9 117 1 66 44 47 87 91
Upvotes: 0