Reputation: 373
I have a dateframe (df),
df = pd.DataFrame({
'ID': ['James', 'James', 'James','Max', 'Max', 'Max', 'Max','Park','Tom', 'Tom', 'Tom', 'Tom','Wong'],
'From_num': [78, 420, 'Started', 298, 36, 298, 'Started', 'Started', 60, 520, 99, 'Started', 'Started'],
'To_num': [96, 78, 420, 36, 78, 36, 298, 311, 150, 520, 78, 99, 39],
'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
'2019-06-20', '2019-01-30', '2018-10-23',
'2018-08-29', '2020-05-21', '2019-11-22',
'2019-08-26', '2018-12-11', '2018-10-09', '2019-02-01']})
And it is like this:
ID From_num To_num Date
0 James 78 96 2020-05-12
1 James 420 78 2020-02-02
2 James Started 420 2019-06-18
3 Max 298 36 2019-06-20
4 Max 36 78 2019-01-30
5 Max 298 36 2018-10-23
6 Max Started 298 2018-08-29
7 Park Started 311 2020-05-21
8 Tom 60 150 2019-11-22
9 Tom 520 520 2019-08-26
10 Tom 99 78 2018-12-11
11 Tom Started 99 2018-10-09
12 Wong Started 39 2019-02-01
For each person (group), I wish to create a new duplicate row on the first row within each group ('ID'), the values for the created row in column'ID', 'From_num' and 'To_num' should be the same as the previous first row, but the 'Date' value is the old 1st row's Date plus one day e.g. for James, the newly created row values is: 'James' '78' '96' '2020-05-13', same as the rest data, so my expected result is:
ID From_num To_num Date
0 James 78 96 2020-05-13 # row added, Date + 1
1 James 78 96 2020-05-12
2 James 420 78 2020-02-02
3 James Started 420 2019-06-18
4 Max 298 36 2019-06-21 # row added, Date + 1
5 Max 298 36 2019-06-20
6 Max 36 78 2019-01-30
7 Max 298 36 2018-10-23
8 Max Started 298 2018-08-29
9 Park Started 311 2020-05-22 # Row added, Date + 1
10 Park Started 311 2020-05-21
11 Tom 60 150 2019-11-23 # Row added, Date + 1
12 Tom 60 150 2019-11-22
13 Tom 520 520 2019-08-26
14 Tom 99 78 2018-12-11
15 Tom Started 99 2018-10-09
16 Wong Started 39 2019-02-02 # Row added Date + 1
17 Wong Started 39 2019-02-01
I wish to the order/sequence is same as my expected result. If you have any good ideas, please help. Many thanks
Upvotes: 3
Views: 125
Reputation: 71689
Use:
df['Date'] = pd.to_datetime(df['Date'])
df['order'] = df.groupby('ID').cumcount().add(1)
df1 = (
df.groupby('ID', as_index=False).first()
.assign(Date=lambda x: x['Date'] + pd.Timedelta(days=1), order=0)
)
df1 = pd.concat([df, df1]).sort_values(['ID', 'order'], ignore_index=True).drop('order', 1)
Details:
Convert the Date
column to pandas datetime
series and use DataFrame.groupby
on column ID
and groupby.cumcount
to impose the total ordering in each groups in the dataframe.
print(df)
ID From_num To_num Date order
0 James 78 96 2020-05-13 1
1 James 78 96 2020-05-12 2
2 James 420 78 2020-02-02 3
3 James Started 420 2019-06-18 4
4 Max 298 36 2019-06-21 1
5 Max 298 36 2019-06-20 2
6 Max 36 78 2019-01-30 3
7 Max 298 36 2018-10-23 4
8 Max Started 298 2018-08-29 5
9 Park Started 311 2020-05-22 1
10 Park Started 311 2020-05-21 2
11 Tom 60 150 2019-11-23 1
12 Tom 60 150 2019-11-22 2
13 Tom 520 520 2019-08-26 3
14 Tom 99 78 2018-12-11 4
15 Tom Started 99 2018-10-09 5
16 Wong Started 39 2019-02-02 1
17 Wong Started 39 2019-02-01 2
Create a new dataframe df1
by using DataFrame.groupby
on column ID
and aggregate using groupby.first
and assigning order=0
and incrementing Date
by pd.Timedelta
of 1 days
.
print(df1)
ID From_num To_num Date order
0 James 78 96 2020-05-14 0 # Date incremented by 1 days
1 Max 298 36 2019-06-22 0 # and ordering added
2 Park Started 311 2020-05-23 0
3 Tom 60 150 2019-11-24 0
4 Wong Started 39 2019-02-03 0
Using pd.concat
concat the dataframes df
and df1
and use DataFrame.sort_values
to sort the dataframe on columns ID
and order
.
print(df1)
ID From_num To_num Date
0 James 78 96 2020-05-14
1 James 78 96 2020-05-13
2 James 78 96 2020-05-12
3 James 420 78 2020-02-02
4 James Started 420 2019-06-18
5 Max 298 36 2019-06-22
6 Max 298 36 2019-06-21
7 Max 298 36 2019-06-20
8 Max 36 78 2019-01-30
9 Max 298 36 2018-10-23
10 Max Started 298 2018-08-29
11 Park Started 311 2020-05-23
12 Park Started 311 2020-05-22
13 Park Started 311 2020-05-21
14 Tom 60 150 2019-11-24
15 Tom 60 150 2019-11-23
16 Tom 60 150 2019-11-22
17 Tom 520 520 2019-08-26
18 Tom 99 78 2018-12-11
19 Tom Started 99 2018-10-09
20 Wong Started 39 2019-02-03
21 Wong Started 39 2019-02-02
22 Wong Started 39 2019-02-01
Upvotes: 1