Reputation: 4638
I'm struggling to figure this out without writing some hideous loop to go over and check the groups after the groupby and merge them together. I feel like their has to be a way to do this that I just don't know.
What I'm trying to do is group a dataframe by two of its columns, but sometimes there will be combinations of those groups where the keys are flipped (i.e. [key1, key2] will have a group and [key2, key1] will have a group. I actually want to merge the groups from these combinations into a single group.
It is possible to do in a loop after the fact. I have also tried using some methods like this:
unique combinations of values in selected columns in pandas data frame and count
but could not get it to work.
here is a sample of my df:
Ves-1 type Ves-2 type Duration
0 cargo tug 898.559993
1 fishing_trawling tug 898.559992
2 fishing_transiting tug 898.559993
3 fishing_transiting tug 898.559993
4 tug tug 898.559992
5 cargo tug 898.560002
6 cargo tug 898.560002
7 passenger tug 907.200008
8 pleasure tug 898.560003
9 cargo tug 898.559993
10 cargo tug 898.559993
11 cargo fishing_transiting 898.560002
12 cargo fishing_transiting 898.559993
13 cargo fishing_transiting 898.560002
14 tug fishing_transiting 898.560003
15 cargo fishing_transiting 907.200008
16 cargo fishing_transiting 907.200008
17 tug fishing_transiting 898.560002
18 cargo fishing_transiting 898.560002
19 fishing_transiting fishing_transiting 898.559993
if I just do a simple groupby using both Ves columns:
>>> test.groupby(['Ves-1 type','Ves-2 type'])['Duration'].agg(list)
Ves-1 type Ves-2 type
cargo fishing_transiting [898.560002, 898.5599930000001, 898.560002, 90...
tug [898.5599930000001, 898.560002, 898.560002, 89...
fishing_transiting fishing_transiting [898.5599930000001]
tug [898.5599930000001, 898.5599930000001]
fishing_trawling tug [898.5599920000001]
passenger tug [907.200008]
pleasure tug [898.560003]
tug fishing_transiting [898.560003, 898.560002]
tug [898.5599920000001]
the problem is now I have a fishing_transiting/tug combo and a tug/fishing_transiting combo... is there any way to have these groupes merged together?
EDIT - another workaround I tried that works, but would like to know if there's a way to just handle this in groupby:
>>> test['key'] = list(zip(test['Ves-1 type'].values, test['Ves-2 type'].values))
>>> test['key'] = test['key'].apply(sorted).astype(str)
>>> test.groupby('key')['Duration'].agg(list)
key
['cargo', 'fishing_transiting'] [898.560002, 898.5599930000001, 898.560002, 90...
['cargo', 'tug'] [898.5599930000001, 898.560002, 898.560002, 89...
['fishing_transiting', 'fishing_transiting'] [898.5599930000001]
['fishing_transiting', 'tug'] [898.5599930000001, 898.5599930000001, 898.560...
['fishing_trawling', 'tug'] [898.5599920000001]
['passenger', 'tug'] [907.200008]
['pleasure', 'tug'] [898.560003]
['tug', 'tug'] [898.5599920000001]
Upvotes: 2
Views: 1450
Reputation: 71687
Let's sort the values in the columns Ves-1 type
and Ves-2 type
along axis=1
then groupby
the dataframe on these sorted columns and agg
Duration
using list
:
c = ['Ves-1 type', 'Ves-2 type']
df.groupby(np.sort(df[c], axis=1).T.tolist())['Duration'].agg(list)
cargo fishing_transiting [898.5600019999999, 898.559993, 898.5600019999...
tug [898.559993, 898.5600019999999, 898.5600019999...
fishing_transiting fishing_transiting [898.559993]
tug [898.559993, 898.559993, 898.5600029999999, 89...
fishing_trawling tug [898.559992]
passenger tug [907.200008]
pleasure tug [898.5600029999999]
tug tug [898.559992]
Name: Duration, dtype: object
Upvotes: 4