Reputation: 131
I want to display the users that have used a value.
import pandas as pd
user = ['alice', 'bob', 'tim', 'alice']
val = [['a','b','c'],['a'],['c','d'],['a','d']]
df = pd.DataFrame({'user': user, 'val': val})
user val
'alice' [a, b, c]
'bob' [a]
'tim' [c, d]
'alice' [a, d]
Desired output:
val users
a [alice,bob]
b [alice]
c [alice,tim]
d [alice,tim]
Any ideas?
Upvotes: 3
Views: 1435
Reputation: 862641
I think need:
df2 = (pd.DataFrame(df['val'].values.tolist(), index=df['user'].values)
.stack()
.reset_index(name='val')
.groupby('val')['level_0']
.unique()
.reset_index()
.rename(columns={'level_0':'user'})
)
print(df2)
val user
0 a [alice, bob]
1 b [alice]
2 c [alice, tim]
3 d [tim, alice]
Upvotes: 1
Reputation: 402483
Step 1
Reshape your data -
from itertools import chain
df = pd.DataFrame({
'val' : list(chain.from_iterable(df.val.tolist())),
'user' : df.user.repeat(df.val.str.len())
})
Step 2
Use groupby
+ apply
+ unique
:
df.groupby('val').user.apply(lambda x: x.unique().tolist())
val
a [alice, bob]
b [alice]
c [alice, tim]
d [tim, alice]
Name: user, dtype: object
Upvotes: 4
Reputation: 2621
This is my approach.
df2 = (df
.set_index('user')
.val
.apply(pd.Series)
.stack()
.reset_index(name='val') # Reshape the data
.groupby(['val'])
.user
.apply(lambda x: sorted(set(x)))) # Show users that use the value
Output:
print(df2)
# val
# a [alice, bob]
# b [alice]
# c [alice, tim]
# d [alice, tim]
# Name: user, dtype: object
Upvotes: 1
Reputation: 441
Don't have enough reputation to write this as a comment, but this question has the answer: How to print dataframe without index
basically, change the last line to:
print(df2.to_string(index=False))
Upvotes: 0