Shubham R
Shubham R

Reputation: 7644

Add row before first and after last position of every group in pandas

i have a dataframe

value    id
100       1
200       1
300       1
500       2
600       2
700       3

i want to group by id and add row before 1st row and after last row of every group such that my dataframe looks: i am adding row with value 0

value    id
0         1
100       1
200       1
300       1
0         1
0         2
500       2
600       2
0         2
0         3
700       3
0         3

Now for every group of id, i want to add sequence column such that:

value    id    sequence
0         1    1
100       1    2
200       1    3
300       1    4
0         1    5
0         2    1
500       2    2
600       2    3
0         2    4
0         3    1
700       3    2
0         3    3

the last part is easy but i am looking for how to add rows before and after every group?

Upvotes: 5

Views: 1267

Answers (4)

edd313
edd313

Reputation: 1459

If like me you are a fan of concatenating dataframes operations, check out this solution:

def add(df, column, value):
    df[column] += value
    return df

(df
 .groupby('id')
 ["value"]
 .apply(lambda x: pd.Series([0] + x.tolist() + [0]))
 .reset_index()
 .rename({"level_1": "sequence"}, axis=1)
 .pipe(add, column="sequence", value=1)
)

This is also faster than the two other top-ranked answers.

edd313

%%timeit
(df
 .groupby('id')
 ["value"]
 .apply(lambda x: pd.Series([0] + x.tolist() + [0]))
 .reset_index()
 .rename({"level_1": "sequence"}, axis=1)
 .pipe(add, column="sequence", value=1)
)

4.28 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

jezrael

%%timeit
df2 = df.groupby('id').apply(f).reset_index(drop=True)

df2['seq'] = df2.groupby('id').cumcount() + 1

4.42 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

cs95

%%timeit
df2 = df.groupby('id')['value']\
        .apply(lambda x: pd.Series([0] + x.tolist() + [0]))\
        .reset_index().drop('level_1', 1)

df2['sequence'] = df2.groupby('id').cumcount() + 1

5.58 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 0

Alexander
Alexander

Reputation: 109546

data = [zip([k] * (len(group) + 2), [0] + group.values.tolist() + [0]) 
        for k, group in df.groupby('id')['value']]
df = pd.DataFrame([x for g in data for x in g], columns=['id', 'value'])
df.assign(sequence=df.groupby(['id'])['value'].transform(
    lambda group: range(1, group.count() + 1)))
>>> df
    id  value  sequence
0    1      0         1
1    1    100         2
2    1    200         3
3    1    300         4
4    1      0         5
5    2      0         1
6    2    500         2
7    2    600         3
8    2      0         4
9    3      0         1
10   3    700         2
11   3      0         3

Upvotes: 2

jezrael
jezrael

Reputation: 862681

Not so easy:

def f(x):
    x = pd.DataFrame(np.concatenate([np.array([[0, x['id'].iat[0]]]), 
                                     x.values,
                                     np.array([[0, x['id'].iat[0]]])]), columns=x.columns)
    return (x)

df = df.groupby('id').apply(f).reset_index(drop=True)

df['seq'] = df.groupby('id').cumcount() + 1
print (df)
    value  id  seq
0       0   1    1
1     100   1    2
2     200   1    3
3     300   1    4
4       0   1    5
5       0   2    1
6     500   2    2
7     600   2    3
8       0   2    4
9       0   3    1
10    700   3    2
11      0   3    3

Upvotes: 3

cs95
cs95

Reputation: 402523

Let's try groupby with an apply.

df = df.groupby('id')['value']\
         .apply(lambda x: pd.Series([0] + x.tolist() + [0]))\
         .reset_index().drop('level_1', 1)
df 
    id  value
0    1      0
1    1    100
2    1    200
3    1    300
4    1      0
5    2      0
6    2    500
7    2    600
8    2      0
9    3      0
10   3    700
11   3      0

And now, use cumcount for the sequence.

df['sequence'] = df.groupby('id').cumcount() + 1

In [228]: df
Out[228]: 
    id  value  sequence
0    1      0         1
1    1    100         2
2    1    200         3
3    1    300         4
4    1      0         5
5    2      0         1
6    2    500         2
7    2    600         3
8    2      0         4
9    3      0         1
10   3    700         2
11   3      0         3

Upvotes: 3

Related Questions