Reputation: 1142
I have the following dataFrame:
id z2 z3 z4
1 2 a fine
2 7 b good
3 9 c delay
4 30 d cold
I am going to generate a data frame by repeating each item in a row twice except items in column z4 (that should not be repeated). How I can do it using python and pandas.
The output should be like this:
id z1 z3 z4
1 2 a fine
1 2 a
1 2 a
2 7 b good
2 7 b
2 7 b
3 9 c delay
3 9 c
3 9 c
4 30 d cold
4 30 d
4 30 d
Upvotes: 1
Views: 150
Reputation: 2320
Another way to do this is to use indexing:
Notice that df.iloc[[0, 1, 2, 3]*2, :3]
will give you two copies of the first three columns.
This can then be appended to the original df
. Remove the NA
. Then sort on index values and reset index (dropping the old index). All of which can be chained:
df.append(df.iloc[[0, 1, 2, 3]*2, :3]).fillna('').sort_index().reset_index(drop=True)
which produces:
id z2 z3 z4
0 1 2 a fine
1 1 2 a
2 1 2 a
3 2 7 b good
4 2 7 b
5 2 7 b
6 3 9 c delay
7 3 9 c
8 3 9 c
9 4 30 d cold
10 4 30 d
11 4 30 d
Upvotes: 2
Reputation: 17339
groupby
and apply
will do the trick:
def func(group):
copy = group.copy()
copy['z4'] = ""
return pd.concat((group, copy, copy))
df.groupby('id').apply(func).reset_index(drop=True)
id z2 z3 z4
0 1 2 a fine
1 1 2 a
2 1 2 a
3 2 7 b good
4 2 7 b
5 2 7 b
6 3 9 c delay
7 3 9 c
8 3 9 c
9 4 30 d cold
10 4 30 d
11 4 30 d
Upvotes: 1