Reputation: 145
I have the following DataFrame structure:
profile_id user birthday
123, 124 test1 day1
131, 132 test2 day2
What I need to display is:
profile_id user birthday
123 test1 day1
124 test1 day1
131 test2 day2
132 test2 day2
In the profile_id column I have a couple of ids separated with a comma, and I need to loop through each id.
Upvotes: 5
Views: 146
Reputation: 51165
Using extractall
and join
:
df.join(
df.pop('profile_id').str.extractall(r'(\d+)').reset_index(1, drop=True)
).rename(columns={0: 'profile_id'})
user birthday profile_id
0 test1 day1 123
0 test1 day1 124
1 test2 day2 131
1 test2 day2 132
Upvotes: 2
Reputation: 294508
df.loc[df.index.repeat(df.profile_id.str.count(', ') + 1)].assign(
profile_id=', '.join(df.profile_id).split(', '))
profile_id user birthday
0 123 test1 day1
0 124 test1 day1
1 131 test2 day2
1 132 test2 day2
sep = ', '
idx = df.index.repeat(df.profile_id.str.count(sep) + 1)
new = sep.join(df.profile_id).split(sep)
df.loc[idx].assign(profile_id=new)
profile_id user birthday
0 123 test1 day1
0 124 test1 day1
1 131 test2 day2
1 132 test2 day2
loc
also get a fresh index
sep = ', '
col = 'profile_id'
p = df[col]
i = np.arange(len(df)).repeat(p.str.count(sep) + 1)
pd.DataFrame({
col: sep.join(p).split(sep),
**{c: df[c].values[i] for c in df if c != col}
}, columns=df.columns)
profile_id user birthday
0 123 test1 day1
1 124 test1 day1
2 131 test2 day2
3 132 test2 day2
Upvotes: 2
Reputation: 40918
You can also do this with a combination of concat()
and .melt()
:
>>> pd.concat((
... df['profile_id'].str.split(', ', expand=True),
... df.drop('profile_id', axis=1)), axis=1)\
... .melt(id_vars=['user', 'birthday'], value_name='profile_id')\
... .drop('variable', axis=1)
user birthday profile_id
0 test1 day1 123
1 test2 day2 131
2 test1 day1 124
3 test2 day2 132
Upvotes: 3
Reputation: 4100
df.profile_id.str.split(",",expand=True).set_index(a.user).stack().reset_index(level=1, drop=True).reset_index().rename(columns={0:"profile_id"})
Upvotes: 2
Reputation: 77007
Here's one way to do
In [1127]: dfs = (df.profile_id.str.split(', ', expand=True).stack()
.reset_index(name='profile_id'))
In [1128]: df.loc[dfs.level_0].assign(profile_id=dfs.profile_id)
Out[1128]:
profile_id user birthday
0 123 test1 day1
0 123 test1 day1
1 124 test2 day2
1 124 test2 day2
Upvotes: 3