Reputation: 7644
i have a dataframe
value id
100 1
200 1
300 1
500 2
600 2
700 3
i want to group by id and add row before 1st row and after last row of every group such that my dataframe looks: i am adding row with value 0
value id
0 1
100 1
200 1
300 1
0 1
0 2
500 2
600 2
0 2
0 3
700 3
0 3
Now for every group of id, i want to add sequence column such that:
value id sequence
0 1 1
100 1 2
200 1 3
300 1 4
0 1 5
0 2 1
500 2 2
600 2 3
0 2 4
0 3 1
700 3 2
0 3 3
the last part is easy but i am looking for how to add rows before and after every group?
Upvotes: 5
Views: 1267
Reputation: 1459
If like me you are a fan of concatenating dataframes operations, check out this solution:
def add(df, column, value):
df[column] += value
return df
(df
.groupby('id')
["value"]
.apply(lambda x: pd.Series([0] + x.tolist() + [0]))
.reset_index()
.rename({"level_1": "sequence"}, axis=1)
.pipe(add, column="sequence", value=1)
)
This is also faster than the two other top-ranked answers.
edd313
%%timeit
(df
.groupby('id')
["value"]
.apply(lambda x: pd.Series([0] + x.tolist() + [0]))
.reset_index()
.rename({"level_1": "sequence"}, axis=1)
.pipe(add, column="sequence", value=1)
)
4.28 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
jezrael
%%timeit
df2 = df.groupby('id').apply(f).reset_index(drop=True)
df2['seq'] = df2.groupby('id').cumcount() + 1
4.42 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cs95
%%timeit
df2 = df.groupby('id')['value']\
.apply(lambda x: pd.Series([0] + x.tolist() + [0]))\
.reset_index().drop('level_1', 1)
df2['sequence'] = df2.groupby('id').cumcount() + 1
5.58 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 0
Reputation: 109546
data = [zip([k] * (len(group) + 2), [0] + group.values.tolist() + [0])
for k, group in df.groupby('id')['value']]
df = pd.DataFrame([x for g in data for x in g], columns=['id', 'value'])
df.assign(sequence=df.groupby(['id'])['value'].transform(
lambda group: range(1, group.count() + 1)))
>>> df
id value sequence
0 1 0 1
1 1 100 2
2 1 200 3
3 1 300 4
4 1 0 5
5 2 0 1
6 2 500 2
7 2 600 3
8 2 0 4
9 3 0 1
10 3 700 2
11 3 0 3
Upvotes: 2
Reputation: 862681
Not so easy:
def f(x):
x = pd.DataFrame(np.concatenate([np.array([[0, x['id'].iat[0]]]),
x.values,
np.array([[0, x['id'].iat[0]]])]), columns=x.columns)
return (x)
df = df.groupby('id').apply(f).reset_index(drop=True)
df['seq'] = df.groupby('id').cumcount() + 1
print (df)
value id seq
0 0 1 1
1 100 1 2
2 200 1 3
3 300 1 4
4 0 1 5
5 0 2 1
6 500 2 2
7 600 2 3
8 0 2 4
9 0 3 1
10 700 3 2
11 0 3 3
Upvotes: 3
Reputation: 402523
Let's try groupby
with an apply
.
df = df.groupby('id')['value']\
.apply(lambda x: pd.Series([0] + x.tolist() + [0]))\
.reset_index().drop('level_1', 1)
df
id value
0 1 0
1 1 100
2 1 200
3 1 300
4 1 0
5 2 0
6 2 500
7 2 600
8 2 0
9 3 0
10 3 700
11 3 0
And now, use cumcount
for the sequence.
df['sequence'] = df.groupby('id').cumcount() + 1
In [228]: df
Out[228]:
id value sequence
0 1 0 1
1 1 100 2
2 1 200 3
3 1 300 4
4 1 0 5
5 2 0 1
6 2 500 2
7 2 600 3
8 2 0 4
9 3 0 1
10 3 700 2
11 3 0 3
Upvotes: 3