Reputation: 335
A sample of my dataset looks like this:
fact_code line_code date style buyer smv
1004 100401 2013-02-02 TS-334 2 7.15
1004 100401 2013-02-03 TS-334 2 7.15
1008 100801 2013-04-12 SST 8 8.58
1008 100801 2013-04-13 SST 8 8.58
... ... ... ... ... ...
1008 100801 2013-07-31 SST 2 9.20
1008 100801 2013-08-01 SST 2 9.20
... ... ... ... ... ...
1008 100802 2013-08-01 SST 2 9.20
1008 100802 2013-08-02 SST 2 9.20
... ... ... ... ... ...
1008 100802 2013-08-12 SST 2 9.20
Variables fact_code
, line_code
are factory and line code respectively. style
is the garment style produced by a given line_code
on a certain date
for a given buyer
.
I aim to create a variable called spell number for each garment (for eg. T-shirts) style, spell_num
, that is produced by a factory line. The variable and the data would look something like this:
fact_code line_code date style buyer smv spell_num
1004 100401 2013-02-02 TS-334 2 7.15 TS-334_1
1004 100401 2013-02-03 TS-334 2 7.15 TS-334_1
1008 100801 2013-04-12 SST 8 8.58 SST_1
1008 100801 2013-04-13 SST 8 8.58 SST_1
... ... ... ... ... ... ...
1008 100801 2013-07-31 SST 2 9.20 SST_1
1008 100801 2013-08-01 SST 2 9.20 SST_1
... ... ... ... ... ... ...
1008 100802 2013-08-01 SST 2 9.20 SST_2
1008 100802 2013-08-02 SST 2 9.20 SST_2
... ... ... ... ... ... ...
1008 100802 2013-08-12 SST 2 9.20 SST_2
The purpose of the variable, spell_num
is to indicate the current spell number of each garment style that is produced by factory line. So, for example, TS-334_1 indicates first spell of production for the garment style TS-334. Similarly, SST_1 indicates the first spell for the style SST, whereas SST_2 indicates its second spell of production. Therefore, whenever a different line produces a style, it indicates a new spell for that same style. The underscore partitions the style name on its left and spell number/indicator on its right.
To this end, this is the code that I have tried among others:
mydf <- mydf %>%
dplyr::group_by(line_code, style) %>%
dplyr::mutate(spell_num = paste(style, as.integer(row_number() == 1L), sep = "_"))
where mydf
is the name of the pertinent data frame.
And below is the resultant output:
fact_code line_code date style buyer smv spell_num
1004 100401 2013-02-02 TS-334 2 7.15 TS-334_1
1004 100401 2013-02-03 TS-334 2 7.15 TS-334_0
1008 100801 2013-04-12 SST 8 8.58 SST_1
1008 100801 2013-04-13 SST 8 8.58 SST_0
... ... ... ... ... ... ...
1008 100801 2013-07-31 SST 2 9.20 SST_0
1008 100801 2013-08-01 SST 2 9.20 SST_0
... ... ... ... ... ... ...
1008 100802 2013-08-01 SST 2 9.20 SST_1
1008 100802 2013-08-02 SST 2 9.20 SST_0
... ... ... ... ... ... ...
1008 100802 2013-08-12 SST 2 9.20 SST_0
As can be seen, the above R code, however, is not yielding the expected values for the variable, spell_num
. For example, instead of the values SST_1, SST_0,...,SST_0 for style SST produced by line 100801 from 2013-04-13 till 2013-08-01, I want it to be rather SST_1, SST_1,...,SST_1, which shows that the style is currently is in its first spell of production.
The spell_num
changes to SST_2 i.e. the second spell of production for style SST, when it is produced by a different factory line, for example, 100802.
Therefore, my question is how can I get the expected values for the variable spell_num
given the above dataset and description?
Any help would be appreciated.
Upvotes: 0
Views: 100
Reputation: 2431
library(tidyverse)
mydf %>% nest(-style, -line_code) %>%
group_by(style) %>%
mutate(index = row_number()) %>%
unnest %>%
mutate(spell_num = paste(style, index, sep = "_"))
Upvotes: 1
Reputation: 145755
With data.table
.
setDT(mydf) # convert to data.table
setkey(mydf, style, linecode) # set key to ensure sorting
mydf[, spell_num := paste(style, rleid(line_code), sep = "_"), by = .(style)]
Upvotes: 3