Reputation: 841
I am trying to rename grouped unique id sequentially using dplyr
in R
.
There are five columns in the data frame as below.
## Load package if necessary
library(tidyverse)
## Set data frame
df <- data.frame(
hid=c(10001,10001,10001,10001,10002,10002,10002,10002,10002,
10003,10003,10003,10003,10003,10003,10004,10004,10004,10004,10004),
mid=c(1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,1,2,3,4,5),
tmc=c(010,01010,0,01020,010,010,010,010,010,010,010,010,0,010,010,010,0,01010,010,01010),
thc=c(010,01010,0,02030,010,020,020,020,030,010,010,010,0,020,030,010,0,02020,030,04040),
mdc=c(000,01010,0,02020,000,010,010,010,010,000,000,010,0,010,020,000,0,02020,010,01010),
itc=c(010,01010,0,02020,020,020,020,020,020,010,010,010,0,020,020,010,0,02020,020,02020)
)
Unique ids are given to each row being grouped by some columns: tmc
, thc
, mdc
and itc
.
## Add unique id grouped by tmc, thc, mdc and itc
df.id <- df %>% mutate(id=as.numeric(interaction(tmc,thc,mdc,itc)))
As it does not give sequential ids, I need to rename it. However, I could not find solution for that. The conditions are:
tmc
, thc
, mdc
and itc
are all 0, id
is set as 0
(I do not know the reason but interaction
gives 1
for such recoreds in my data frame)id
s are set as 4,8,2,2,8
, it should be renamed as 1,2,3,3,2
)Followings scripts show what I am doing currently. id
is temporary id obtained from interaction
function but I need to obtain sequential id indicated in id.desired
column.
## Replace unique id sequentially
## IT DOES NOT GIVE DESIRED OUTPUT
# df.id %>% group_by(id) %>% mutate(id2=seq_along(id))
## Desired id is shown in `id.desired`
## `id` is the ones obtained from `interaction` function, which are not set sequentially
hid mid tmc thc mdc itc id id.desired
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 10001 1 10 10 0 10 166 1
2 10001 2 1010 1010 1010 1010 595 2
3 10001 3 0 0 0 0 1 0
4 10001 4 1020 2030 2020 2020 796 3
5 10002 1 10 10 0 20 326 4
6 10002 2 10 20 10 20 362 5
7 10002 3 10 20 10 20 362 5
8 10002 4 10 20 10 20 362 5
9 10002 5 10 30 10 20 366 6
10 10003 1 10 10 0 10 166 1
11 10003 2 10 10 0 10 166 1
12 10003 3 10 10 10 10 198 7
13 10003 4 0 0 0 0 1 0
14 10003 5 10 20 10 20 362 5
15 10003 6 10 30 20 20 398 8
16 10004 1 10 10 0 10 166 1
17 10004 2 0 0 0 0 1 0
18 1004 3 1010 2020 2020 2020 791 9
19 10004 4 10 30 10 20 366 6
20 10004 5 1010 4040 1010 2020 767 10
Any suggestions?
I prefer to use dplyr
in this operation.
I received some suggestions in the previous question, however it is not the same structure in this case (dummy
field does not exist in current data frame).
How to renumber result of intersection/group_indices in R?
Upvotes: 2
Views: 2999
Reputation: 39154
A solution using the tidyverse
. Notice that I did not use the interaction
function. Instead, I used the group_indices
function from dplyr
to create the group index and then convert to factor and change the levels based on the occurrence order in the column. df2
is the final output.
library(tidyverse)
df2 <- df %>%
filter_at(vars(tmc, thc, mdc, itc), any_vars(. != 0)) %>%
mutate(id = group_indices(., tmc, thc, mdc, itc)) %>%
mutate(id = as.numeric(factor(id, levels = unique(id)))) %>%
left_join(df, ., by = names(df)) %>%
replace_na(list(id = 0))
df2
# hid mid tmc thc mdc itc id
# 1 10001 1 10 10 0 10 1
# 2 10001 2 1010 1010 1010 1010 2
# 3 10001 3 0 0 0 0 0
# 4 10001 4 1020 2030 2020 2020 3
# 5 10002 1 10 10 0 20 4
# 6 10002 2 10 20 10 20 5
# 7 10002 3 10 20 10 20 5
# 8 10002 4 10 20 10 20 5
# 9 10002 5 10 30 10 20 6
# 10 10003 1 10 10 0 10 1
# 11 10003 2 10 10 0 10 1
# 12 10003 3 10 10 10 10 7
# 13 10003 4 0 0 0 0 0
# 14 10003 5 10 20 10 20 5
# 15 10003 6 10 30 20 20 8
# 16 10004 1 10 10 0 10 1
# 17 10004 2 0 0 0 0 0
# 18 10004 3 1010 2020 2020 2020 9
# 19 10004 4 10 30 10 20 6
# 20 10004 5 1010 4040 1010 2020 10
Upvotes: 3
Reputation: 361
(edited mutate based on your clarification in comments)
Here are the two things I tried to do:
id.desired
I used dense_rank() function. Here is the code based on the dataset you shared:
df %>%
mutate(id = if_else(tmc == 0 & thc == 0 & mdc == 0 & itc == 0, 0,
as.numeric(interaction(tmc, thc, mdc, itc, lex.order = TRUE)))) %>%
mutate(id.desired = dense_rank(id) - 1)
The output looks like this
hid mid tmc thc mdc itc id id.desired
1 10001 1 10 10 0 10 227 1
2 10001 2 1010 1010 1010 1010 519 7
3 10001 3 0 0 0 0 0 0
4 10001 4 1020 2030 2020 2020 775 10
5 10002 1 10 10 0 20 228 2
6 10002 2 10 20 10 20 258 4
7 10002 3 10 20 10 20 258 4
8 10002 4 10 20 10 20 258 4
9 10002 5 10 30 10 20 283 5
10 10003 1 10 10 0 10 227 1
11 10003 2 10 10 0 10 227 1
12 10003 3 10 10 10 10 232 3
13 10003 4 0 0 0 0 0 0
14 10003 5 10 20 10 20 258 4
15 10003 6 10 30 20 20 288 6
16 10004 1 10 10 0 10 227 1
17 10004 2 0 0 0 0 0 0
18 10004 3 1010 2020 2020 2020 550 8
19 10004 4 10 30 10 20 283 5
20 10004 5 1010 4040 1010 2020 595 9
Upvotes: 0
Reputation: 157
Not sure how to interpret the id.desired
column but here is an example based on the two conditions and using data.table:
require(data.table)
df = data.table(df)
df[tmc != 0 & thc != 0 & mdc != 0 & itc != 0, ID := 1:.N, by = .(tmc, thc, mdc, itc)]
df[is.na(ID), ID := 0]
Upvotes: 0