K C
K C

Reputation: 433

How to get desired dataframe as output from groupby and value counts in pandas

I am trying to query a dataframe and store the results into another dataframe.

The dataframe sample I have is:

 train       |start     |end      |passenger_type
 rajdhani    |howrah    |allahabad|normal
 hwh indb spl|howrah    |allahabad|tatkal
 hwh indb spl|allahabad |howrah   |normal
 

I have split the dataframe into two - one for passenger_type normal and one for tatkal. Then for each dataframe corresponding to passenger_type I want the result dataframe to be like:

 station1  |  station2      |  frequency | freq by rajdhani| freq by hwh indb
 howrah    |  allahabad     |  2         | 1               | 1
 allahabad |  howrah        |  1         | 0               | 0

I am using groupby with value_counts and sorting the results, but the result is not what I want.

I am getting something like this:

                end
  start| end | frequency

The code I am using is :

pd.DataFrame(train.groupby('start').end.value_counts().sort_values(ascending = False))

Can someone please help me with the code to obtain the desired results.

Upvotes: 1

Views: 53

Answers (1)

perl
perl

Reputation: 9941

You can calculate value_counts on start, end and train, and then unstack train:

z = (df[['start', 'end', 'train']]
         .value_counts()
         .unstack(fill_value=0)
         .add_prefix('freq by '))

z['frequency'] = z.sum(1)

df_out = z.sort_values('frequency', ascending=False).reset_index()
df_out

Output:

train      start        end  freq by hwh indb spl  freq by rajdhani  frequency
0         howrah  allahabad                     1                 1          2
1      allahabad     howrah                     1                 0          1

And for example, getting start, end and frequency:

df_out[['start', 'end', 'frequency']]

Output:

train      start        end  frequency
0         howrah  allahabad          2
1      allahabad     howrah          1

Upvotes: 4

Related Questions