김지영
김지영

Reputation: 359

groupby based on conditions

I'm handling my data. Here's my data.

enter image description here

I write my code like this.

complete_data = complete_data.groupby(['STDR_YM_CD', 'TRDAR_CD' ]).sum().reset_index()

I got the dataframe like below picture After executing the code

enter image description here

But I wanna aggregate the values based on the first three letters of characters in SVC_INDUTY_CD column like below picture.

enter image description here

Here is my data link http://blogattach.naver.com/c356df6c7f2127fbd539596759bfc1bd1848b453f1/20170316_215_blogfile/khm2963_1489653338468_dtPz6k_csv/test2.csv?type=attachment

Thank in advance

Upvotes: 0

Views: 66

Answers (1)

Andrew L
Andrew L

Reputation: 7038

I'm sure there's a better way but this is one way you could do this:

complete_data['first_three_temp'] = complete_data['SVC_INDUTY_CD'].str[:3]
complete_data = complete_data.groupby(['STDR_YM_CD', 'TRDAR_CD', 'first_three_temp' ], as_index=False).sum()
complete_data.drop('first_three_temp', axis=1, inplace=True)

This will add a temporary column containing only the first three characters of your SVC_INDUTY_CD column. You can then groupby on and drop the temporary column. As I said I'm sure there's a more efficient way so I'm not sure if you'll be limited by the size of your dataset.

Upvotes: 1

Related Questions