Reputation: 272
I have a dataset like this:
user-id time location msg path
1 1 1 1 0
2 1 1 2 1000
3 1 2 3 1
4 1 2 0 0
5 1 3 0 0
1 2 2 2 0
2 2 1 1 1
3 2 1 1 1
4 2 0 0 0
5 2 0 0 0
1 3 1 3 0
2 3 3 1 0
I want to find the path based on the maximum number of msg where the time and the location are the same for two records.
time_locs = pd.unique(df['time_loc'])
for time_loc in time_locs:
dc_group = df[df['time_loc'] == time_loc]
if(len(dc_group) > 1):
max_num_msg = max(dc_group['msgs'])
So I zipped the time and location as time_loc and found the maximum number of msges. Now how can I found the path of that row?
For example, in this case my first dc-group is this two rows:
user-id time location msg path
1 1 1 1 0
2 1 1 2 1000
I want to find 1000.
I tried this code but it is not working.
user_group = df.loc[max(dc_group['msgs']), 'path']
because it is searching in all df. And .loc is not working for dc_group, it mean this code face error:
user_group = dc_group.loc[max(dc_group['msgs']), 'path']
Upvotes: 1
Views: 79
Reputation: 95873
You definitely want to use non-looping approaches here. You can use .argmax
to get the index of the maximum value instead of the value itself. Something like:
In [15]: df
Out[15]:
user-id time location msg path
0 1 1 1 1 0
1 2 1 1 1 0
2 3 1 2 0 0
3 4 1 2 0 0
4 5 1 3 0 0
5 1 2 2 2 0
6 2 2 1 1 0
7 3 2 1 1 0
8 4 2 0 0 0
9 5 2 0 0 0
10 1 3 1 3 0
11 2 3 3 1 0
In [16]: df.loc[df.time == df.location, 'msg'].argmax()
Out[16]: 5
In [17]: max_idx = df.loc[df.time == df.location, 'msg'].argmax()
In [18]: df.loc[max_idx]
Out[18]:
user-id 1
time 2
location 2
msg 2
path 0
Name: 5, dtype: int64
In [19]: df.loc[max_idx, 'path']
Out[19]: 0
If you want all the rows, then just use boolean indexing:
In [25]: df.loc[df.time == df.location]
Out[25]:
user-id time location msg path
0 1 1 1 1 0
1 2 1 1 1 0
5 1 2 2 2 0
11 2 3 3 1 0
Or .query
if you prefer:
In [26]: df.query('time == location')
Out[26]:
user-id time location msg path
0 1 1 1 1 0
1 2 1 1 1 0
5 1 2 2 2 0
11 2 3 3 1 0
Upvotes: 1