Cesar
Cesar

Reputation: 617

Sum of count where values are less than row

I'm using Pandas to come up with new column that will search through the entire column with values [1-100] and will count the values where it's less than the current row.

See [df] example below:

[A][NewCol]
 1 0
 3 2
 2 1
 5 4
 8 5
 3 2

Essentially, for each row I need to look at the entire Column A, and count how many values are less than the current row. So for Value 5, there are 4 values that are less (<) than 5 (1,2,3,3).

What would be the easiest way of doing this?

Thanks!

Upvotes: 3

Views: 1566

Answers (6)

BENY
BENY

Reputation: 323396

I am using numpy broadcast

s=df.A.values
(s[:,None]>s).sum(1)
Out[649]: array([0, 2, 1, 4, 5, 2])

#df['NewCol']=(s[:,None]>s).sum(1)

timing

df=pd.concat([df]*1000)

%%timeit
s=df.A.values
(s[:,None]>s).sum(1)
10 loops, best of 3: 83.7 ms per loop
%timeit (df['A'].rank(method='min') - 1).astype(int)
1000 loops, best of 3: 479 µs per loop

Upvotes: 7

Scott Boston
Scott Boston

Reputation: 153550

One way to do it like this, use rank with method='min':

df['NewCol'] = (df['A'].rank(method='min') - 1).astype(int)

Output:

   A  NewCol
0  1       0
1  3       2
2  2       1
3  5       4
4  8       5
5  3       2

Upvotes: 7

SolverWorld
SolverWorld

Reputation: 675

You didn't specify if speed or memory usage was important (or if you had a very large dataset). The "easiest" way to do it is straightfoward: calculate how many are less then i for each entry in the column and collect those into a new column:

df=pd.DataFrame({'A': [1,3,2,5,8,3]})
col=df['A']
df['new_col']=[ sum(col<i) for i in col ]

print(df)

Result:

   A  new_col
0  1        0
1  3        2
2  2        1
3  5        4
4  8        5
5  3        2

There might be more efficient ways to do this on large datasets, such as sorting your column first.

Upvotes: 1

anky
anky

Reputation: 75150

Another way is sort and reset index:

m=df.A.sort_values().reset_index(drop=True).reset_index()
m.columns=['new','A']
print(m)

   new  A
0    0  1
1    1  2
2    2  3
3    3  3
4    4  5
5    5  8

Upvotes: 1

Nathaniel
Nathaniel

Reputation: 3290

You can do it this way:

import pandas as pd

df = pd.DataFrame({'A': [1,3,2,5,8,3]})

df['NewCol'] = 0
for idx, row in df.iterrows():
    df.loc[idx, 'NewCol'] = (df.loc[:, 'A'] < row.A).sum()

print(df)
   A  NewCol
0  1       0
1  3       2
2  2       1
3  5       4
4  8       5
5  3       2

Upvotes: 1

SdahlSean
SdahlSean

Reputation: 583

Try this code

A = [Your numbers]
less_than = []
    for element in A:
        counter = 0
        for number in A:
            if number < element:
                counter += 1
        less_than.append(counter)

Upvotes: 1

Related Questions