Reputation: 1284
In stackoverflow there are several post asking about how to tagging specific combinations of levels among columns. However, I can't find how to do this but without considering the NA
s. As an example:
df <- read.table(header=TRUE, text="
cat val
1 aaa 0.05638315
2 aaa 0.25767250
3 aaa NA
4 aaa 0.46854928
5 aaa 0.55232243
6 bbb NA
7 bbb 0.37032054
8 bbb 0.48377074
9 bbb 0.54655860
10 bbb 0.81240262
11 ccc 0.28035384
12 ccc 0.39848790
13 ccc 0.62499648
14 ccc 0.76255108
15 ccc 0.88216552")
I would expect this:
df
cat val id
1 aaa 0.05638315 1
2 aaa 0.25767250 2
3 aaa NA NA
4 aaa 0.46854928 3
5 aaa 0.55232243 4
6 bbb NA NA
7 bbb 0.37032054 1
8 bbb 0.48377074 2
9 bbb 0.54655860 3
10 bbb 0.81240262 4
11 ccc 0.28035384 1
12 ccc 0.39848790 2
13 ccc 0.62499648 3
14 ccc 0.76255108 4
15 ccc 0.88216552 5
Possible codes are those, however, they do not contemplate NAs
:
library(dplyr)
df <- df %>% group_by(cat) %>% mutate(id = row_number())
head(df)
# A tibble: 6 x 3
# Groups: cat [2]
cat val id
<fct> <dbl> <int>
1 aaa 0.0564 1
2 aaa 0.258 2
3 aaa NA 3
4 aaa 0.469 4
5 aaa 0.552 5
6 bbb NA 1
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
df <- DT[, id := rowid(cat)]
head(df)
cat val id
1: aaa 0.05638315 1
2: aaa 0.25767250 2
3: aaa NA 3
4: aaa 0.46854928 4
5: aaa 0.55232243 5
6: bbb NA 1
Also, I wonder how to do if what I want is to start numbering from the first record, and from that, we numbering without considering the NA
s. An example:
df2 <- read.table(header=TRUE, text="
cat val
1 aaa 0.05638315
2 aaa 0.25767250
3 aaa NA
4 aaa 0.46854928
5 aaa 0.55232243
6 bbb NA
7 bbb 0.37032054
8 bbb 0.48377074
9 bbb NA
10 bbb 0.81240262
11 ccc 0.28035384
12 ccc 0.39848790
13 ccc 0.62499648
14 ccc 0.76255108
15 ccc 0.88216552")
I would expect that:
df2
cat val id
1 aaa 0.056383 1
2 aaa 0.257673 2
3 aaa NA 3
4 aaa 0.468549 4
5 aaa 0.552322 5
6 bbb NA NA
7 bbb 0.370321 1
8 bbb 0.483771 2
9 bbb NA 3
10 bbb 0.812403 4
11 ccc 0.280354 1
12 ccc 0.398488 2
13 ccc 0.624996 3
14 ccc 0.762551 4
15 ccc 0.882166 5
I would appreciate any help.
Upvotes: 2
Views: 96
Reputation: 388817
You can tweak the suggestions that you have to get the desired output :
library(dplyr)
df %>%
group_by(cat) %>%
mutate(id = replace(row_number() - cumsum(is.na(val)), is.na(val), NA))
# cat val id
# <chr> <dbl> <int>
# 1 aaa 0.0564 1
# 2 aaa 0.258 2
# 3 aaa NA NA
# 4 aaa 0.469 3
# 5 aaa 0.552 4
# 6 bbb NA NA
# 7 bbb 0.370 1
# 8 bbb 0.484 2
# 9 bbb 0.547 3
#10 bbb 0.812 4
#11 ccc 0.280 1
#12 ccc 0.398 2
#13 ccc 0.625 3
#14 ccc 0.763 4
#15 ccc 0.882 5
This can be also be written in data.table
library(data.table)
setDT(df)[, id:= replace(seq_len(.N) - cumsum(is.na(val)), is.na(val), NA), cat]
and base R :
df$id <- with(df, replace(ave(val, cat, FUN = function(x)
seq_along(x) - cumsum(is.na(x))), is.na(val), NA))
The logic is to create an id
variable which is current row number in the group minus the number of NA
's that have occurred till that time.
For updated df2
to consider the NA
only in first row we can use :
df2 %>%
group_by(cat) %>%
mutate(id = replace(row_number() - is.na(first(val)),1* is.na(first(val)), NA))
# cat val id
# <chr> <dbl> <int>
# 1 aaa 0.0564 1
# 2 aaa 0.258 2
# 3 aaa NA 3
# 4 aaa 0.469 4
# 5 aaa 0.552 5
# 6 bbb NA NA
# 7 bbb 0.370 1
# 8 bbb 0.484 2
# 9 bbb NA 3
#10 bbb 0.812 4
#11 ccc 0.280 1
#12 ccc 0.398 2
#13 ccc 0.625 3
#14 ccc 0.763 4
#15 ccc 0.882 5
Upvotes: 2
Reputation: 16090
You can restrict a :=
assignment to certain rows using the first argument of [.data.table
:
setDT(df)
df[complete.cases(val), id := seq_len(.N), by = .(cat)]
Upvotes: 2