ali
ali

Reputation: 71

How to efficiently check conditions on two columns and perform operation on third column in python

I have three columns with thousands of rows. Numbers in column 1 and 2 change from 1 to 6. I desire to check combinations of numbers in both column 1 and 2 to divide the value in column 3 by a certain value.

1     2    3.036010    
1     3    2.622544    
3     1    2.622544    
1     2    3.036010    
2     1    3.036010  

Further, column 3 will be divided by same number if values of column 1 and column 2 are swapped. For example, for 1 2 and 2 1 combinations, column 3 may be divided by same value. My present approach does the job, but I would have to write several conditions manually. What could be more efficient way to perform this task? Thanks in advance!

my_data = np.loadtxt('abc.dat')

for row in my_data:    
    if row[0] == 1 and row[1] == 2:
        row[3]/some_value
   



  

Upvotes: 0

Views: 497

Answers (4)

Ali_Sh
Ali_Sh

Reputation: 2826

If you want to combine some conditions like your code. you can use operator & for and or | for or in np.where:

cond1 = my_data[:, 0] == 1                    # cond is a masked Boolean array for where the first condition is satisfied
cond2 = my_data[:, 1] == 2
some_value = 10
indices = np.where(cond1 & cond2)[0]          # it gets indices for where the two conditions are satisfied
# indices = np.where(cond1 | cond2)[0]        # it gets indices for where at least one of the masks is satisfied
result = my_data[:, 2][indices] / some_value  # operation is done on the specified indices

and if you want to modify the 2nd column in place, as Ballesta answer

my_data[:, 2][indices] = my_data[:, 2][indices] / some_value

np.logical_and and np.logical_or are the other modules that can handle these such conditions, too; These modules must be used as np.logical_and.reduce and np.logical_or.reduce if conditions are more than two.

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 149185

Numpy offers np.where which allows for vectorized test:

result = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])

or if you want to change the array in place:

data[:, 2] = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])

Upvotes: 1

Kevin
Kevin

Reputation: 3368

You could use a mask for this:

import numpy as np
my_data = np.column_stack([np.random.randint(1, 6, (1000, 2)), np.random.randn(1000)])
some_value = 123

mask = my_data[:, 0] == my_data[:, 1]
# divide 
my_data[mask, 2] /= some_value

output in my_data

Upvotes: 1

Ziur Olpa
Ziur Olpa

Reputation: 2133

Maybe using pandas is more suitable for this task, you can define conditions and apply them to tabular data without any explicit loop.

Upvotes: 0

Related Questions