Elham
Elham

Reputation: 272

how can find the value of a row based on another column?

I have a dataset like this:

user-id     time    location   msg  path
  1           1         1       1    0
  2           1         1       2    1000
  3           1         2       3    1
  4           1         2       0    0
  5           1         3       0    0
  1           2         2       2    0
  2           2         1       1    1
  3           2         1       1    1
  4           2         0       0    0
  5           2         0       0    0
  1           3         1       3    0
  2           3         3       1    0

I want to find the path based on the maximum number of msg where the time and the location are the same for two records.

time_locs = pd.unique(df['time_loc'])
for time_loc in time_locs:
   dc_group = df[df['time_loc'] == time_loc]
   if(len(dc_group) > 1):
        max_num_msg = max(dc_group['msgs'])

So I zipped the time and location as time_loc and found the maximum number of msges. Now how can I found the path of that row?

For example, in this case my first dc-group is this two rows:

user-id     time    location   msg  path
  1           1         1       1    0
  2           1         1       2    1000

I want to find 1000.

I tried this code but it is not working.

user_group = df.loc[max(dc_group['msgs']), 'path']

because it is searching in all df. And .loc is not working for dc_group, it mean this code face error:

user_group = dc_group.loc[max(dc_group['msgs']), 'path']

Upvotes: 1

Views: 79

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

You definitely want to use non-looping approaches here. You can use .argmax to get the index of the maximum value instead of the value itself. Something like:

In [15]: df
Out[15]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
2         3     1         2    0     0
3         4     1         2    0     0
4         5     1         3    0     0
5         1     2         2    2     0
6         2     2         1    1     0
7         3     2         1    1     0
8         4     2         0    0     0
9         5     2         0    0     0
10        1     3         1    3     0
11        2     3         3    1     0

In [16]: df.loc[df.time == df.location, 'msg'].argmax()
Out[16]: 5

In [17]: max_idx = df.loc[df.time == df.location, 'msg'].argmax()

In [18]: df.loc[max_idx]
Out[18]:
user-id     1
time        2
location    2
msg         2
path        0
Name: 5, dtype: int64

In [19]: df.loc[max_idx, 'path']
Out[19]: 0

If you want all the rows, then just use boolean indexing:

In [25]: df.loc[df.time == df.location]
Out[25]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
5         1     2         2    2     0
11        2     3         3    1     0

Or .query if you prefer:

In [26]: df.query('time == location')
Out[26]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
5         1     2         2    2     0
11        2     3         3    1     0

Upvotes: 1

Related Questions