Reputation: 2046
I have a numpy structured array that looks like this:
idx lvl start end
60 71 10.0 0.0
60 72 0.0 25.0
60 73 0.0 35.0
61 73 5.0 0.0
65 71 5.0 0.0
67 72 5.0 0.0
67 74 0.0 10.0
...
I want to make a new array with this under some conditions.
1) Rows that have at least one start value and one end value are used (idx 60 and 67 rows are used in this example).
2) If there are multiple start and end values, only the biggest end value's level and the smallest start value's level for the level are used(idx 60 will have 71 and 73).
The result will look like this:
idx start_lvl end_lvl
60 71 73
67 72 74
I don't mind using pandas, but I'd like to avoid making addtional arrays or using loops. Are there any simple ways to do this?
Upvotes: 1
Views: 64
Reputation: 863166
First filter by Series.duplicated
only rows with dupes in idx
column, then create index by lvl
column, so possible use DataFrameGroupBy.idxmax
- get index values by maximum of columns:
#create DataFrame from structured array, thanks @SpghttCd
df = pd.DataFrame(struct_arr)
df = df[df['idx'].duplicated(keep=False)].set_index('lvl').groupby('idx').idxmax()
print (df)
start end
idx
60 71 73
67 72 74
By description need idxmin
for start
- it return first minimum:
df2 = (df[df['idx'].duplicated(keep=False)]
.set_index('lvl')
.groupby('idx')
.agg({'start':'idxmin', 'end':'idxmax'}))
print (df2)
start end
idx
60 72 73
67 74 74
Upvotes: 3