Reputation: 331
I have a email meta-data table that is sorted in the below order, wherein I know that the data is sorted and each occurrence of "From" means that the next set of entries represent attributes of another email.
The column has repeating patterns as below :
==============
Tag
==============
From
Recepient
CC_Recepient
CC_Recepient
Subject
From
Recepient
CC_Recepient
Subject
From
Recepient
Subject
From
etc..
==============
I need to create a second column which is a unique identifier for each email related group of entries as below. Repeating ocurrence of "From" is the only way I have to identify the start of next group of entries.
<table><tbody><tr><th>Tag </th><th>Identifier</th></tr><tr><td>From </td><td>1</td></tr><tr><td>Recepient </td><td>1</td></tr><tr><td>CC_Recepient </td><td>1</td></tr><tr><td>CC_Recepient </td><td>1</td></tr><tr><td>Subject</td><td>1</td></tr><tr><td>From </td><td>2</td></tr><tr><td>Recepient</td><td>2</td></tr><tr><td>CC_Recepient</td><td>2</td></tr><tr><td>Subject</td><td>2</td></tr><tr><td>From</td><td>3 </td></tr><tr><td>Recepient</td><td>3</td></tr><tr><td>Subject</td><td>3</td></tr><tr><td>From</td><td>4</td></tr><tr><td>etc..</td><td> </td></tr></tbody></table>
Upvotes: 0
Views: 137
Reputation: 214957
You can check if Tag
is equal to From, and then do cumsum
on the conditions:
df$Identifier <- cumsum(df$Tag == "From")
df
# Tag Identifier
#1 From 1
#2 Recepient 1
#3 CC_Recepient 1
#4 CC_Recepient 1
#5 Subject 1
#6 From 2
#7 Recepient 2
#8 CC_Recepient 2
#9 Subject 2
#10 From 3
#11 Recepient 3
#12 Subject 3
#13 From 4
#14 etc.. 4
Upvotes: 3