Marnix
Marnix

Reputation: 769

Getting lists of indices from pandas dataframe

I'm trying to get a list of indices out of a pandas dataframe.

First do an import.

import pandas as pd

Construct a pandas dataframe.

# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
        'reports': [4, 24, 31, 2, 3, 5, 10],
        'coverage': [True, False, False, False, True, True, False]}
df = pd.DataFrame(data)
print(df)

Output:

  coverage   name  reports
0     True  Jason        4
1    False  Jason       24
2    False   Tina       31
3    False   Tina        2
4     True   Tina        3
5     True  Jason        5
6    False   Tina       10

I would like to have the indices on the left of the dataframe when the coverage is set to True, but I would like to have this for every name separately. Preferably do this without an explicit for-loop.

Desired output is something like this.

list_Jason = [0, 5]
list_Tina = [4]

Attempted solution: I thought I should use 'groupby' and then access the coverage column. From there I don't know how to proceed. All help is appreciated.

df.groupby('name')['coverage']

Upvotes: 3

Views: 1405

Answers (3)

cs95
cs95

Reputation: 402253

This is doable, using boolean indexing first followed by the groupby:

In [942]: df[df.coverage].groupby('name').agg({'reports' : lambda x: list(x.index)})
Out[942]: 
      reports
name         
Jason  [0, 5]
Tina      [4]

You may use dfGroupBy.agg to get your output as a column of lists.

Upvotes: 1

Binyamin Even
Binyamin Even

Reputation: 3382

This should work:

  grouped=df.groupby('name').apply(lambda x: x.index[x.coverage].values)

output:

name
Jason    [0, 5]
Tina        [4]

Upvotes: 0

greg_data
greg_data

Reputation: 2293

You want to get the index out for each group.

this is stored in the 'groups' attribute of a groupbydataframe.

#filter for coverage==True
#group by 'name'
#access the 'groups' attribute
by_person = df[df.coverage].groupby('name').groups

will return:

{'Jason': Int64Index([0, 5], dtype='int64'),
 'Tina': Int64Index([4], dtype='int64')}

From which you can access the individuals as you would a regular dictionary:

by_person['Jason']

returns:

Int64Index([0, 5], dtype='int64')

Which you can treat like a regular list.

Upvotes: 2

Related Questions