Cranjis
Cranjis

Reputation: 1960

Pandas dataframe fill values between indexes

I have list of indexes on a dataframe with 5 columns.

indexes = [5,9,44]

I want to create a pd.Series with the value of a certain columns between the indexes. So, if the

row#5 = ["a","b","c","d","e"]
row#9 = ["q","w","e","r","t"]
row#44= ["z","x","c","v","b"]

and the columns_idx is 3, I want a vector with "d" in indexes 0-5, "r" in indexes 5-9, and "v" in indexes 9-44, and nan in indexes 44+. What is the best way to do so?

Upvotes: 1

Views: 5616

Answers (1)

jezrael
jezrael

Reputation: 863611

Solution working if default RangeIndex - convert all values outside indexeses to NaN and back filling values:

indexes = [5,9,44]
columns_idx = 3

out = df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill()

Another solution:

df.iloc[:, columns_idx] = df.iloc[indexes, columns_idx]
out = df.iloc[:, columns_idx].bfill()

Sample:

np.random.seed(343)
df = pd.DataFrame(np.random.randint(100, size=(20, 5)), columns=list('abcde'))
print(df)
     a   b   c   d   e
0   63  18  56  22  50
1   22  12   4  38  23
2   65  19   2  13  51
3   43  90  46  44  36
4   74  43  92  34  18
5   39  24  62  21   5
6   19  90  51  79  77
7    3   0  51  26  22
8   36  54  25  22   4
9    5  66  76  98  65
10  42  95  89  49  98
11  84  82  88  87  35
12  63  58  17  72  92
13  78  47  71  23   3
14  37  60  82  96  66
15  18  83  93  82  82
16  34  77   2  50  17
17  63   8  78  74  89
18  25  86  83  42  77
19  82  35  77  46  42

indexes = [5,9,14]
columns_idx = 3

out = df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill()
print (out)
0     21.0
1     21.0
2     21.0
3     21.0
4     21.0
5     21.0
6     98.0
7     98.0
8     98.0
9     98.0
10    96.0
11    96.0
12    96.0
13    96.0
14    96.0
15     NaN
16     NaN
17     NaN
18     NaN
19     NaN
Name: d, dtype: float64

Explanation:

Compare index values for True per indices from indexes with isin:

print (df.index.isin(indexes))
[False False False False False  True False False False  True False False
 False False  True False False False False False]

Seelct column by position with DataFrame.iloc:

print (df.iloc[:, columns_idx])
0     22
1     38
2     13
3     44
4     34
5     21
6     79
7     26
8     22
9     98
10    49
11    87
12    72
13    23
14    96
15    82
16    50
17    74
18    42
19    46
Name: d, dtype: int32

Replace non matched values by where:

print (df.iloc[:, columns_idx].where(df.index.isin(indexes)))
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5     21.0
6      NaN
7      NaN
8      NaN
9     98.0
10     NaN
11     NaN
12     NaN
13     NaN
14    96.0
15     NaN
16     NaN
17     NaN
18     NaN
19     NaN
Name: d, dtype: float64

And last back filling missing values:

print (df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill())
0     21.0
1     21.0
2     21.0
3     21.0
4     21.0
5     21.0
6     98.0
7     98.0
8     98.0
9     98.0
10    96.0
11    96.0
12    96.0
13    96.0
14    96.0
15     NaN
16     NaN
17     NaN
18     NaN
19     NaN
Name: d, dtype: float64

Upvotes: 3

Related Questions