Reputation: 1960
I have list of indexes on a dataframe with 5 columns.
indexes = [5,9,44]
I want to create a pd.Series with the value of a certain columns between the indexes. So, if the
row#5 = ["a","b","c","d","e"]
row#9 = ["q","w","e","r","t"]
row#44= ["z","x","c","v","b"]
and the columns_idx is 3,
I want a vector with "d"
in indexes 0-5
, "r"
in indexes 5-9
, and "v"
in indexes 9-44
, and nan
in indexes 44+
.
What is the best way to do so?
Upvotes: 1
Views: 5616
Reputation: 863611
Solution working if default RangeIndex
- convert all values outside indexes
es to NaN
and back filling values:
indexes = [5,9,44]
columns_idx = 3
out = df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill()
Another solution:
df.iloc[:, columns_idx] = df.iloc[indexes, columns_idx]
out = df.iloc[:, columns_idx].bfill()
Sample:
np.random.seed(343)
df = pd.DataFrame(np.random.randint(100, size=(20, 5)), columns=list('abcde'))
print(df)
a b c d e
0 63 18 56 22 50
1 22 12 4 38 23
2 65 19 2 13 51
3 43 90 46 44 36
4 74 43 92 34 18
5 39 24 62 21 5
6 19 90 51 79 77
7 3 0 51 26 22
8 36 54 25 22 4
9 5 66 76 98 65
10 42 95 89 49 98
11 84 82 88 87 35
12 63 58 17 72 92
13 78 47 71 23 3
14 37 60 82 96 66
15 18 83 93 82 82
16 34 77 2 50 17
17 63 8 78 74 89
18 25 86 83 42 77
19 82 35 77 46 42
indexes = [5,9,14]
columns_idx = 3
out = df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill()
print (out)
0 21.0
1 21.0
2 21.0
3 21.0
4 21.0
5 21.0
6 98.0
7 98.0
8 98.0
9 98.0
10 96.0
11 96.0
12 96.0
13 96.0
14 96.0
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
Name: d, dtype: float64
Explanation:
Compare index values for True
per indices from indexes
with isin
:
print (df.index.isin(indexes))
[False False False False False True False False False True False False
False False True False False False False False]
Seelct column by position with DataFrame.iloc
:
print (df.iloc[:, columns_idx])
0 22
1 38
2 13
3 44
4 34
5 21
6 79
7 26
8 22
9 98
10 49
11 87
12 72
13 23
14 96
15 82
16 50
17 74
18 42
19 46
Name: d, dtype: int32
Replace non matched values by where
:
print (df.iloc[:, columns_idx].where(df.index.isin(indexes)))
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 21.0
6 NaN
7 NaN
8 NaN
9 98.0
10 NaN
11 NaN
12 NaN
13 NaN
14 96.0
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
Name: d, dtype: float64
And last back filling missing values:
print (df.iloc[:, columns_idx].where(df.index.isin(indexes)).bfill())
0 21.0
1 21.0
2 21.0
3 21.0
4 21.0
5 21.0
6 98.0
7 98.0
8 98.0
9 98.0
10 96.0
11 96.0
12 96.0
13 96.0
14 96.0
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
Name: d, dtype: float64
Upvotes: 3