maynull
maynull

Reputation: 2046

How to make a numpy array with some conditions?

I have a numpy structured array that looks like this:

  idx lvl start   end
   60  71  10.0   0.0
   60  72   0.0  25.0
   60  73   0.0  35.0
   61  73   5.0   0.0
   65  71   5.0   0.0
   67  72   5.0   0.0
   67  74   0.0  10.0
   ...

I want to make a new array with this under some conditions.

1) Rows that have at least one start value and one end value are used (idx 60 and 67 rows are used in this example).

2) If there are multiple start and end values, only the biggest end value's level and the smallest start value's level for the level are used(idx 60 will have 71 and 73).

The result will look like this:

idx start_lvl end_lvl
 60        71      73
 67        72      74

I don't mind using pandas, but I'd like to avoid making addtional arrays or using loops. Are there any simple ways to do this?

Upvotes: 1

Views: 64

Answers (1)

jezrael
jezrael

Reputation: 863166

First filter by Series.duplicated only rows with dupes in idx column, then create index by lvl column, so possible use DataFrameGroupBy.idxmax - get index values by maximum of columns:

 #create DataFrame from structured array, thanks @SpghttCd 
df = pd.DataFrame(struct_arr)

df = df[df['idx'].duplicated(keep=False)].set_index('lvl').groupby('idx').idxmax()
print (df)
     start  end
idx            
60      71   73
67      72   74

By description need idxmin for start - it return first minimum:

df2 = (df[df['idx'].duplicated(keep=False)]
           .set_index('lvl')
           .groupby('idx')
           .agg({'start':'idxmin', 'end':'idxmax'}))
print (df2)
     start  end
idx            
60      72   73
67      74   74

Upvotes: 3

Related Questions