Reputation: 95
I am trying to select a slice from a column in a dataframe based on a condition applied to a second, same-sized dataframe. I do not use column names and therefore need a solution that (I think) uses iloc.
Here is my code:
import pandas as pd
permnos = pd.read_csv('Ranks.csv')
total_cols = len(permnos.columns)
pntls = permnos.copy(deep=True)
for i in range(total_cols):
for j in range(pntls.iloc[:,i].count()):
pntls.iloc[j,i] = (j+1)/pntls.iloc[:,i].count()
print((pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2))
#print(permnos.iloc[(pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2), 0])
The dataframe permnos
contains identifiers (5 digit numbers) with columns of various row lengths. In each column, these identifiers are sorted (done in a separate program) from "best" to "worst". The for loop writes the dataframe pntls
such that a percent rank is created for each element in each column of permnos
based on position. That is, the values in each column of pntls
range from ~0 to 1, in ascending order. Through this step, all is well.
In my line currently commented out (the problem), I am trying to print the elements in column 0 of permnos
for which the values in column 0 of pntls
are greater than 0.1 and less than or equal to 0.2. [Note: Once I figure out how to actually select the desired elements, I have code that will use .tolist() to add the slice to a list; I think I have that bit sorted out if I can get the slice]. The currently commented out line yields:
NotImplementedError: iLocation based boolean indexing on an integer type is not available
To give an example of my desired output, suppose that in column 0 of pntls
the values between 0.1 and 0.2 are in rows 5 through 8, inclusive. Then I'd like to return the slice from permnos
of permnos.iloc[5:9, 0]
.
Is it possible to "communicate" between the two dataframes? If so, any help is greatly appreciated.
Upvotes: 0
Views: 1257
Reputation: 862641
I think "communicate" between the two dataframes is not possible, because different objects. But if same length and same index is possible create mask in one df
and apply it to second one.
Also is possible filtering if different indexes (but same lengths), then is necessary add .values
to mask
like permnos[m.values].iloc[:, 0]
for boolean numpy array.
pntls = pd.DataFrame({'B':[.2,.1,.12,.4,.15,.4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]})
print (pntls)
B C D E
0 0.20 7 1 5
1 0.10 8 3 3
2 0.12 9 5 6
3 0.40 4 7 9
4 0.15 2 1 2
5 0.40 3 0 4
m = (pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2)
print (m)
0 True
1 False
2 True
3 False
4 True
5 False
Name: B, dtype: bool
permnos = pd.DataFrame({'a':[7,8,9,4,2,3],
'b':[1,3,5,7,1,0],
'c':[1,3,5,7,1,0],
'd':[5,3,6,9,2,4]})
print (permnos)
a b c d
0 7 1 1 5
1 8 3 3 3
2 9 5 5 6
3 4 7 7 9
4 2 1 1 2
5 3 0 0 4
First filter by boolean indexing
and then select:
df = permnos[m].iloc[:, 0]
print (df)
0 7
2 9
4 2
Name: a, dtype: int64
Use loc
and select columns names by slicing:
df = permnos.loc[m, permnos.columns[0]]
print (df)
0 7
2 9
4 2
Name: a, dtype: int64
Your original solution return:
df = permnos.iloc[m, 0]
print (df)
NotImplementedError: iLocation based boolean indexing on an integer type is not available
Upvotes: 1