Andres
Andres

Reputation: 183

Grouping by value in column and getting another columns value

This is the seed DataSet:

In[1]: my_data =
      [{'client':'A','product_s_n':'1','status':'in_store','month':'Jan'}, 
       {'client':'A','product_s_n':'1','status':'sending', 'month':'Feb'}, 
       {'client':'A','product_s_n':'2','status':'in_store','month':'Jan'},
       {'client':'A','product_s_n':'2','status':'in_store','month':'Feb'},
       {'client':'B','product_s_n':'3','status':'in_store','month':'Jan'},
       {'client':'B','product_s_n':'3','status':'sending', 'month':'Feb'},
       {'client':'B','product_s_n':'4','status':'in_store','month':'Jan'},
       {'client':'B','product_s_n':'4','status':'in_store','month':'Feb'},
       {'client':'C','product_s_n':'5','status':'in_store','month':'Jan'},
       {'client':'C','product_s_n':'5','status':'sending', 'month':'Feb'}]
df = pd.DataFrame(my_data)
df

Out[1]:
      client    month   product_s_n   status
0       A       Jan     1             in_store
1       A       Feb     1             sending
2       A       Jan     2             in_store
3       A       Feb     2             in_store
4       B       Jan     3             in_store
5       B       Jan     4             in_store
6       B       Feb     4             in_store
8       C       Jan     5             sending

The question I want to ask this data is: what's the client for each product_serial_number? From the data in this example, this is how the resulting DataFrame would look like (I need a new DataFrame as a result):

    product_s_n    client   
0        1            A
1        2            A
2        3            B
3        4            B
4        5            C

As you may have noticed, the 'status' and 'month' fields are just for 'giving sense' and structure to the data in this sample dataset. Tried using groupby, with no success. Any ideas?

Thanks!

Upvotes: 0

Views: 46

Answers (1)

unutbu
unutbu

Reputation: 879869

After calling df.groupby(['product_s_n']) you can restrict attention to a particular column by indexing with ['client']. You can then select the first value of client from each group by calling first().

>>> df.groupby(['product_s_n'])['client'].first()    
product_s_n
1              A
2              A
3              B
4              B
5              C
Name: client, dtype: object

Upvotes: 2

Related Questions