Elhanan Schwarts
Elhanan Schwarts

Reputation: 383

Comparing values of particular column in numpy and returning the indexes of the rows where the equation exists

The next numpy is given:

path = np.array([['S','A','N','V','T'],
                 ['S','R','Z','V','W'],
                 ['S','D','C','E','Y'],
                 ['S','W','C','E','Y'],
                 ['S','Q','R','E','B'],
                 ['S','Q','R','Z','Z']])

One need to compare the values in column 3 and return lists of rows index where the equality exists. In the example above in column 3 two equations exist (values V and E) therefore the returned result should be:

[[0,1],[2,3,4]]

Upvotes: 0

Views: 258

Answers (4)

Aaj Kaal
Aaj Kaal

Reputation: 1304

Code:

import numpy as np
import pandas as pd

path = np.array([['S','A','N','V','T'],
                 ['S','R','Z','V','W'],
                 ['S','D','C','E','Y'],
                 ['S','W','C','E','Y'],
                 ['S','Q','R','E','B'],
                 ['S','Q','R','Z','Z']])
                 
df = pd.DataFrame(path, columns=['a','b','c','d','e'])
print(df.groupby(['d']).groups)

Output:

{'E': [2, 3, 4], 'V': [0, 1], 'Z': [5]}

Upvotes: 0

trigonom
trigonom

Reputation: 528

if you don't know what are the repeating values

value_dict = {}
for i in range(path.shape[0]):
    if(path[i,3]) in value_dict :
        value_dict [path[i,3]].append(i)
    else:
        value_dict [path[i,3]]=[i]

index_list = []
for k in value_dict:
    if len(value_dict[k])>1:
        index_list.append(value_dict[k])

output is [[0, 1], [2, 3, 4]]

Upvotes: 1

gold_cy
gold_cy

Reputation: 14236

Use np.where to find the results you are looking for.

path = np.array([['S','A','N','V','T'],
                 ['S','R','Z','V','W'],
                 ['S','D','C','E','Y'],
                 ['S','W','C','E','Y'],
                 ['S','Q','R','E','B'],
                 ['S','Q','R','Z','Z']])

vs, _ = np.where(path == "V")
es, _ = np.where(path == "E")

idxs = [vs, es]

print(idxs)
>> [array([0, 1]), array([2, 3, 4])]

You can also use np.argwhere.

idxs = [np.argwhere(path == "V")[:, 0], np.argwhere(path == "E")[:, 0]]

print(idxs)
>> [array([0, 1]), array([2, 3, 4])]

Upvotes: 1

Ironkey
Ironkey

Reputation: 2607

try this out:

import numpy as np

path = np.array([['S','A','N','V','T'],
                 ['S','R','Z','V','W'],
                 ['S','D','C','E','Y'],
                 ['S','W','C','E','Y'],
                 ['S','Q','R','E','B'],
                 ['S','Q','R','Z','Z']])

path = path[:,3] # 3 being the column you want

indexes = [[x for x,i in enumerate(path) if i == "V"], [x for x,i in enumerate(path) if i == "E"]] 

print(indexes)
[[0, 1], [2, 3, 4]]

Upvotes: 0

Related Questions