Reputation: 2080
Say I have the following data frame,
df.head()
ID start end symbol symbol_id type
1 146 291 bus bus-201 CDS
1 146 314 bus bus-201 trans
1 603 243 bus bus-201 CDS
1 1058 2123 car car-203 CDS
1 910 81 car car-203 ex
1 2623 2686 car car-203 CDS
1 5948 6043 car car-203 CDS
1 6348 6474 car car-203 CDS
1 910 81 car car-201 ex
1 910 81 car car-201 ex
1 636 650 car car-203 CDS
1 202 790 train train-204 CDS
1 200 314 train train-204 CDS
1 202 837 train train-204 CDS
Now from the above data frame, I need to group by items based on column symbol_id
if column type
is CDS
. Then, I need to use the first value from the column start
as the value in start column of the new data frame and last value from column
endas the value in column
end`.
Finally, the df2
should look like,
start end symbol symbol_id type
146 243 bus bus-203 CDS
1058 650 car car-203 CDS
202 837 train train-204 CDS
I have tried using list
of values from df['symbol']
,
sym_list=df['symbol'].tolist().drop_duplicates()
for symbol in df['symbol'].values:
if symbol in tuple(sym_list):
df_symbol =df['symbol'].isin(symbol)
which threw the following error,
TypeError: only list-like objects are allowed to be passed to isin(), you passed a [str]
I was trying to capture the first and last value for each symbol
and symbol_id
value using,
start = df.query('type =="CDS"')[['start']].iloc[0]
end = df.query('type =="CDS"')[['end']].iloc[-1]
However, my data frame is quite big and I have more than 50,000 unique values for symbol
, hence I need a better solution here.
Any help or suggestions are appreciated!!
Upvotes: 1
Views: 43
Reputation: 4792
Try:
df_group = df[df['type']=='CDS'].groupby(['symbol_id', 'symbol', 'type'])
df_new = pd.DataFrame(columns =['start', 'end'])
df_new[['start', 'end']] = df_group.agg({'start':'first', 'end': 'last'})
df_new.reset_index()
symbol_id symbol start end type
0 bus-201 bus 146 243 CDS
1 car-203 car 1058 650 CDS
2 train-204 train 202 837 CDS
Edited using agg as used by @Dev Khadka
Upvotes: 1
Reputation: 5451
you can do it using group by and first and last aggrigate function
df[df["type"]=="CDS"].groupby("symbol_id").agg({"start":"first", "end":"last", "symbol":"first","symbol_id":"first", "type":"first"})
Upvotes: 2