Reputation: 145

Manipulate pandas dataframe to display desired output

I have the following DataFrame structure:

profile_id  user   birthday
123, 124    test1  day1
131, 132    test2  day2

What I need to display is:

profile_id  user   birthday
123        test1   day1 
124        test1   day1
131        test2   day2
132        test2   day2

In the profile_id column I have a couple of ids separated with a comma, and I need to loop through each id.

Upvotes: 5

Answers (5)

user3483203

Reputation: 51165

Using extractall and join:

df.join(
    df.pop('profile_id').str.extractall(r'(\d+)').reset_index(1, drop=True)
).rename(columns={0: 'profile_id'})

    user birthday profile_id
0  test1     day1        123
0  test1     day1        124
1  test2     day2        131
1  test2     day2        132

Upvotes: 2

piRSquared

Reputation: 294508

One-liner

df.loc[df.index.repeat(df.profile_id.str.count(', ') + 1)].assign(
    profile_id=', '.join(df.profile_id).split(', '))

  profile_id   user birthday
0        123  test1     day1
0        124  test1     day1
1        131  test2     day2
1        132  test2     day2

Broken down

sep = ', '
idx = df.index.repeat(df.profile_id.str.count(sep) + 1)
new = sep.join(df.profile_id).split(sep)
df.loc[idx].assign(profile_id=new)

  profile_id   user birthday
0        123  test1     day1
0        124  test1     day1
1        131  test2     day2
1        132  test2     day2

Numpy slice instead of `loc`

also get a fresh index

sep = ', '
col = 'profile_id'
p = df[col]
i = np.arange(len(df)).repeat(p.str.count(sep) + 1)
pd.DataFrame({
    col: sep.join(p).split(sep),
    **{c: df[c].values[i] for c in df if c != col}
}, columns=df.columns)

  profile_id   user birthday
0        123  test1     day1
1        124  test1     day1
2        131  test2     day2
3        132  test2     day2

Upvotes: 2

Brad Solomon

Reputation: 40918

You can also do this with a combination of concat() and .melt():

>>> pd.concat((
...            df['profile_id'].str.split(', ', expand=True),
...            df.drop('profile_id', axis=1)), axis=1)\
...     .melt(id_vars=['user', 'birthday'], value_name='profile_id')\
...     .drop('variable', axis=1)
    user birthday profile_id
0  test1     day1        123
1  test2     day2        131
2  test1     day1        124
3  test2     day2        132

Upvotes: 3

Rahul Agarwal

Reputation: 4100

df.profile_id.str.split(",",expand=True).set_index(a.user).stack().reset_index(level=1, drop=True).reset_index().rename(columns={0:"profile_id"})

Upvotes: 2

Zero

Reputation: 77007

Here's one way to do

In [1127]: dfs = (df.profile_id.str.split(', ', expand=True).stack()
                   .reset_index(name='profile_id'))

In [1128]: df.loc[dfs.level_0].assign(profile_id=dfs.profile_id)
Out[1128]:
  profile_id   user birthday
0        123  test1     day1
0        123  test1     day1
1        124  test2     day2
1        124  test2     day2

Upvotes: 3

Manipulate pandas dataframe to display desired output

Answers (5)

One-liner

Broken down

Numpy slice instead of loc

Related Questions

Numpy slice instead of `loc`