Mary
Mary

Reputation: 1142

Repeating items in a data frame using pandas

I have the following dataFrame:

id  z2  z3  z4  
1   2   a   fine 
2   7   b   good
3   9   c   delay
4   30  d   cold

I am going to generate a data frame by repeating each item in a row twice except items in column z4 (that should not be repeated). How I can do it using python and pandas.

The output should be like this:

id  z1  z3  z4
1   2   a   fine 
1   2   a   
1   2   a   
2   7   b   good
2   7   b   
2   7   b   
3   9   c   delay
3   9   c   
3   9   c   
4   30  d   cold
4   30  d   
4   30  d   

Upvotes: 1

Views: 150

Answers (2)

Vidhya G
Vidhya G

Reputation: 2320

Another way to do this is to use indexing: Notice that df.iloc[[0, 1, 2, 3]*2, :3] will give you two copies of the first three columns.

This can then be appended to the original df. Remove the NA. Then sort on index values and reset index (dropping the old index). All of which can be chained:

df.append(df.iloc[[0, 1, 2, 3]*2, :3]).fillna('').sort_index().reset_index(drop=True)

which produces:

    id  z2 z3     z4
0    1   2  a   fine
1    1   2  a       
2    1   2  a       
3    2   7  b   good
4    2   7  b       
5    2   7  b       
6    3   9  c  delay
7    3   9  c       
8    3   9  c       
9    4  30  d   cold
10   4  30  d       
11   4  30  d

Upvotes: 2

Dennis Golomazov
Dennis Golomazov

Reputation: 17339

groupby and apply will do the trick:

def func(group):
    copy = group.copy()
    copy['z4'] = ""
    return pd.concat((group, copy, copy))

df.groupby('id').apply(func).reset_index(drop=True)


    id  z2 z3     z4
0    1   2  a   fine
1    1   2  a       
2    1   2  a       
3    2   7  b   good
4    2   7  b       
5    2   7  b       
6    3   9  c  delay
7    3   9  c       
8    3   9  c       
9    4  30  d   cold
10   4  30  d       
11   4  30  d       

Upvotes: 1

Related Questions