CodingNewb
CodingNewb

Reputation: 95

Select dataframe slice using condition on a separate dataframe

I am trying to select a slice from a column in a dataframe based on a condition applied to a second, same-sized dataframe. I do not use column names and therefore need a solution that (I think) uses iloc.

Here is my code:

import pandas as pd

permnos = pd.read_csv('Ranks.csv')
total_cols = len(permnos.columns)

pntls = permnos.copy(deep=True)

for i in range(total_cols):
    for j in range(pntls.iloc[:,i].count()):
        pntls.iloc[j,i] = (j+1)/pntls.iloc[:,i].count()

print((pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2))

#print(permnos.iloc[(pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2), 0])

The dataframe permnos contains identifiers (5 digit numbers) with columns of various row lengths. In each column, these identifiers are sorted (done in a separate program) from "best" to "worst". The for loop writes the dataframe pntls such that a percent rank is created for each element in each column of permnos based on position. That is, the values in each column of pntls range from ~0 to 1, in ascending order. Through this step, all is well.

In my line currently commented out (the problem), I am trying to print the elements in column 0 of permnos for which the values in column 0 of pntls are greater than 0.1 and less than or equal to 0.2. [Note: Once I figure out how to actually select the desired elements, I have code that will use .tolist() to add the slice to a list; I think I have that bit sorted out if I can get the slice]. The currently commented out line yields:

NotImplementedError: iLocation based boolean indexing on an integer type is not available

To give an example of my desired output, suppose that in column 0 of pntls the values between 0.1 and 0.2 are in rows 5 through 8, inclusive. Then I'd like to return the slice from permnos of permnos.iloc[5:9, 0].

Is it possible to "communicate" between the two dataframes? If so, any help is greatly appreciated.

Upvotes: 0

Views: 1257

Answers (1)

jezrael
jezrael

Reputation: 862641

I think "communicate" between the two dataframes is not possible, because different objects. But if same length and same index is possible create mask in one df and apply it to second one.

Also is possible filtering if different indexes (but same lengths), then is necessary add .values to mask like permnos[m.values].iloc[:, 0] for boolean numpy array.

pntls  = pd.DataFrame({'B':[.2,.1,.12,.4,.15,.4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4]})

print (pntls)
      B  C  D  E
0  0.20  7  1  5
1  0.10  8  3  3
2  0.12  9  5  6
3  0.40  4  7  9
4  0.15  2  1  2
5  0.40  3  0  4

m = (pntls.iloc[:,0] > 0.1) & (pntls.iloc[:,0] <= 0.2)
print (m)
0     True
1    False
2     True
3    False
4     True
5    False
Name: B, dtype: bool

permnos  = pd.DataFrame({'a':[7,8,9,4,2,3],
                   'b':[1,3,5,7,1,0],
                   'c':[1,3,5,7,1,0],
                   'd':[5,3,6,9,2,4]})

print (permnos)
   a  b  c  d
0  7  1  1  5
1  8  3  3  3
2  9  5  5  6
3  4  7  7  9
4  2  1  1  2
5  3  0  0  4

First filter by boolean indexing and then select:

df = permnos[m].iloc[:, 0]
print (df)
0    7
2    9
4    2
Name: a, dtype: int64

Use loc and select columns names by slicing:

df = permnos.loc[m, permnos.columns[0]]
print (df)
0    7
2    9
4    2
Name: a, dtype: int64

Your original solution return:

df = permnos.iloc[m, 0]
print (df)

NotImplementedError: iLocation based boolean indexing on an integer type is not available

Upvotes: 1

Related Questions