Reputation: 555
I have a pandas dataframe from data I read from a CSV. One column is for the name of a group, while the other column contains a string (that looks like a list), like the following:
Group | Followers
------------------------------------------
biebers | u'user1', u'user2', u'user3'
catladies | u'user4', u'user5'
bkworms | u'user6', u'user7'
I'd like to try to split up the strings in the "Followers" column and make a separate dataframe where each row is for a user, as well as a column showing which group they're in. So for this example I'd like to get the following:
User | Group
--------------------------------
user1 | biebers
user2 | biebers
user3 | biebers
user4 | catladies
user5 | catladies
user6 | bkworms
user7 | bkworms
Anyone have suggestions for the best way to approach this? Here's a screenshot of what it looks like:
Upvotes: 2
Views: 943
Reputation: 294218
df.Followers = df.Followers.str.replace(r"u'([^']*)'", r'\1')
df.set_index('Group').Followers.str.split(r',\s*', expand=True) \
.stack().rename('User').reset_index('Group').set_index('User')
To keep User
as a column.
df.Followers = df.Followers.str.replace(r"u'([^']*)'", r'\1')
df.set_index('Group').Followers.str.split(r',\s*', expand=True) \
.stack().rename('User').reset_index('Group') \
.reset_index(drop=True)[['User', 'Group']]
Upvotes: 2