Yassin
Yassin

Reputation: 316

Split disorganized arrays with numpy

I am using the below code to read arrays from csv files.

x,y = np.loadtxt(filename, delimiter=';', unpack=True, skiprows=1, usecols=(1,2))

Being x and array that goes like this [5,5,5,0,1,1,2,3,3,4,5,5,5] and y [111.0,111.1,111.2,111.3,111.4,111.5...]

I want to split both arrays accordingly using x. So my expected output would be something like this:

[1,1,1,1,1..][111.4,111.5,111.6...]
[2,2,2,2,..][111.10,111.11,111.12...]
[5,5,5,5,5,...][111.0,111.1,111.2...111.20,111.21,111.22]
...

So that I can choose between the x values and it would return its respective y values

I've tried using np.split np.split(x,[21,1,2,3...]) but It doesn't seem to be working for me.

Upvotes: 0

Views: 75

Answers (1)

drompix
drompix

Reputation: 566

Despite the fact that my solution is probably not the most efficient one performance-wise, you can use it as a starting point for future investigations

import numpy as np

# some dummy data
x = np.array([5,5,5,0,1,1,2,3,3,4,5,5,5])
y = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])

def split_by_ids(data: np.array, ids: np.array):
    splits = []  # result storage
    # get unique indicies with their counts from ids array
    elems, counts = np.unique(ids, return_counts=True)
    # go through each index and its count
    for index, count in zip(elems, counts):
        # create array of same index and grab corresponding values from data
        splits.append((np.repeat(index, count), data[ids == index]))

    return splits
    
split_result = split_by_ids(y, x)
for ids, values in split_result:
    print(f'Ids: {ids}, Values: {values}')

Above code resulted in

Ids: [0], Values: [3]
Ids: [1 1], Values: [4 5]
Ids: [2], Values: [6]
Ids: [3 3], Values: [7 8]
Ids: [4], Values: [9]
Ids: [5 5 5 5 5 5], Values: [ 0  1  2 10 11 12]

Upvotes: 1

Related Questions