Reputation: 73
The goal is to create a dictionary from a pandas column (series) where the keys are the unique elements of the column, and the values are the row indices in which the elements occur. I currently have code that accomplishes this, but I'm wondering if there is a simpler and less hacky way to do it:
df = pd.DataFrame(np.random.randint(0,100,size=(1000, 4)), columns=list('ABCD'))
idx = df['A'].reset_index().groupby('A')['index'].apply(tuple).to_dict()
Upvotes: 4
Views: 929
Reputation: 59579
This is the groups
attribute of a GroupBy object. It returns a dict with unique values as the keys and Index
objects of the Original DataFrame.
df.groupby('A').groups
{0: Int64Index([61, 466, 505, 619, 697, 811, 872], dtype='int64'),
1: Int64Index([125, 254, 278, 330, 390, 396, 670, 732, 748, 849, 871, 880, 882,
908, 943], dtype='int64'),
2: Int64Index([77, 283, 401, 543, 544, 693, 816], dtype='int64'),
...}
Or if you really need the tuples:
{k: tuple(v) for k,v in df.groupby('A').groups.items()}
Upvotes: 4
Reputation: 323396
You can do
d = {x : y['index'].tolist() for x , y in df.reset_index().groupby(list(df))}
Upvotes: 1