Reputation: 205
I have a data like this, and want to create a column called 'Month'
+---------+------------------+------+------+
| Name | Task | Team | Date |
+---------+------------------+------+------+
| John | Market study | A | 1 |
+---------+------------------+------+------+
| Michael | Customer service | B | 1 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 1 |
+---------+------------------+------+------+
| John | Accounting | B | 2 |
+---------+------------------+------+------+
| Michael | Customer service | A | 2 |
+---------+------------------+------+------+
| Joanna | Market study | C | 2 |
+---------+------------------+------+------+
| John | Customer service | C | 1 |
+---------+------------------+------+------+
| Michael | Market study | A | 1 |
+---------+------------------+------+------+
| Joanna | Customer service | B | 1 |
+---------+------------------+------+------+
| John | Market study | A | 2 |
+---------+------------------+------+------+
| Michael | Customer service | B | 2 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 2 |
+---------+------------------+------+------+
So basicly, I have the date information, but the date does not contain the month it belongs to. However, I know that if it occurs the first time, then it will belong the Month 1, if it occurs for the second time, then it will belong to Month 2. So for example, date 1 occurs 3 time, then interrupted by date 2. So the first 3 time belongs to month 1, and the next 3 times occur, it belong to Month 2. So I would like my result like this:
+---------+------------------+------+------+---------+
| Name | Task | Team | Date | Month |
+---------+------------------+------+------+---------+
| John | Market study | A | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 1 | Month 1 |
+---------+------------------+------+------+---------+
| John | Accounting | B | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | A | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Market study | C | 2 | Month 1 |
+---------+------------------+------+------+---------+
| John | Customer service | C | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Market study | A | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Customer service | B | 1 | Month 2 |
+---------+------------------+------+------+---------+
| John | Market study | A | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 2 | Month 2 |
+---------+------------------+------+------+---------+
I do not have any idea, other than just using some loops. Thank you all.
Upvotes: 1
Views: 54
Reputation: 25239
If I understand the question correctly, you may do following: create mask s
to separate each consective values into separate group. From s
, create mask s1
to each value of each group. Groupby s1
and Date
and doing cumcount
and map
to create desired output:
s = df.Date.ne(df.Date.shift()).cumsum()
s1 = df.Date.groupby(s).cumcount()
df['Month'] = df.groupby([s1, 'Date']).Name.cumcount().add(1).map(lambda x: 'Month '+str(x))
Out[897]:
Name Task Team Date Month
0 John Market-study A 1 Month 1
1 Michael Customer-service B 1 Month 1
2 Joanna Accounting C 1 Month 1
3 John Accounting B 2 Month 1
4 Michael Customer-service A 2 Month 1
5 Joanna Market-study C 2 Month 1
6 John Customer-service C 1 Month 2
7 Michael Market-study A 1 Month 2
8 Joanna Customer-service B 1 Month 2
9 John Market-study A 2 Month 2
10 Michael Customer-service B 2 Month 2
11 Joanna Accounting C 2 Month 2
Upvotes: 1