Reputation: 433
I am trying to query a dataframe and store the results into another dataframe.
The dataframe sample I have is:
train |start |end |passenger_type
rajdhani |howrah |allahabad|normal
hwh indb spl|howrah |allahabad|tatkal
hwh indb spl|allahabad |howrah |normal
I have split the dataframe into two - one for passenger_type normal and one for tatkal. Then for each dataframe corresponding to passenger_type I want the result dataframe to be like:
station1 | station2 | frequency | freq by rajdhani| freq by hwh indb
howrah | allahabad | 2 | 1 | 1
allahabad | howrah | 1 | 0 | 0
I am using groupby with value_counts and sorting the results, but the result is not what I want.
I am getting something like this:
end
start| end | frequency
The code I am using is :
pd.DataFrame(train.groupby('start').end.value_counts().sort_values(ascending = False))
Can someone please help me with the code to obtain the desired results.
Upvotes: 1
Views: 53
Reputation: 9941
You can calculate value_counts
on start, end and train, and then unstack
train:
z = (df[['start', 'end', 'train']]
.value_counts()
.unstack(fill_value=0)
.add_prefix('freq by '))
z['frequency'] = z.sum(1)
df_out = z.sort_values('frequency', ascending=False).reset_index()
df_out
Output:
train start end freq by hwh indb spl freq by rajdhani frequency
0 howrah allahabad 1 1 2
1 allahabad howrah 1 0 1
And for example, getting start, end and frequency:
df_out[['start', 'end', 'frequency']]
Output:
train start end frequency
0 howrah allahabad 2
1 allahabad howrah 1
Upvotes: 4