Reputation: 5789
Consider this dataset:
cat > example_file.txt <<EOL
bins|1,1|2,2|3,4|5,6|7,9|10,12|13,18|19,24|25,36|37,54|55,81
1,1||431|501|651|861|1081|1461|1711|2071|2371|2531
2,2|||261|401|631|871|1291|1611|2001|2291|2431
3,4|||121|271|551|1011|1191|1511|1901|2211|2351
5,6||||101|361|901|1251|1301|1691|2011|2221
7,9|||||181|461|841|1151|1511|1821|2061
10,12||||||161|591|931|1291|1621|1821
13,18|||||||351|691|1091|1401|1571
19,24||||||||301|861|1451|1201
25,36|||||||||371|851|961
37,54||||||||||371|621
55,81|||||||||||351
EOL
which I import:
import pandas
example = pandas.read_csv('example_file.txt', sep = '|', index_col = 0)
Now, the row and column names of this table are actually interval bounds. I can have pandas recognize them as such:
col_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns])
row_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index])
example.columns = col_bin
example.index = row_bin
Now, I would like to find the cell value corresponding to value pairs such as (11, 13).
For example, for (11, 13) the cell value is 591.
This is because the row index value (11) belongs to the interval (10,12) [this is the 6th row] and the col index value (13) belongs to the interval (13,18) [this is the 7th column] and the value at (row, column) (6, 7) in the table is 591.
Upvotes: 2
Views: 1578
Reputation: 862431
You need first add parameter closed='both'
to IntervalIndex.from_tuples
, because intevals are closed in both sides:
col_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns], closed='both')
row_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index], closed='both')
example.columns = col_bin
example.index = row_bin
print (example)
[1, 1] [2, 2] [3, 4] [5, 6] [7, 9] [10, 12] [13, 18] \
[1, 1] NaN 431.0 501.0 651.0 861.0 1081.0 1461.0
[2, 2] NaN NaN 261.0 401.0 631.0 871.0 1291.0
[3, 4] NaN NaN 121.0 271.0 551.0 1011.0 1191.0
[5, 6] NaN NaN NaN 101.0 361.0 901.0 1251.0
[7, 9] NaN NaN NaN NaN 181.0 461.0 841.0
[10, 12] NaN NaN NaN NaN NaN 161.0 591.0
[13, 18] NaN NaN NaN NaN NaN NaN 351.0
[19, 24] NaN NaN NaN NaN NaN NaN NaN
[25, 36] NaN NaN NaN NaN NaN NaN NaN
[37, 54] NaN NaN NaN NaN NaN NaN NaN
[55, 81] NaN NaN NaN NaN NaN NaN NaN
[19, 24] [25, 36] [37, 54] [55, 81]
[1, 1] 1711.0 2071.0 2371.0 2531
[2, 2] 1611.0 2001.0 2291.0 2431
[3, 4] 1511.0 1901.0 2211.0 2351
[5, 6] 1301.0 1691.0 2011.0 2221
[7, 9] 1151.0 1511.0 1821.0 2061
[10, 12] 931.0 1291.0 1621.0 1821
[13, 18] 691.0 1091.0 1401.0 1571
[19, 24] 301.0 861.0 1451.0 1201
[25, 36] NaN 371.0 851.0 961
[37, 54] NaN NaN 371.0 621
[55, 81] NaN NaN NaN 351
And then get positions by IntervalIndex.get_loc
and select by DataFrame.iat
:
tup = (11, 13)
pos1 = example.index.get_loc(tup[0])
pos2 = example.columns.get_loc(tup[1])
print (example.iat[pos1, pos2])
591.0
Upvotes: 2