epiNS
epiNS

Reputation: 353

subset row in data.table by id

I feel there must be a simple data.table solution for this problem. I have the following data:

library(data.table)

data <- data.table(
    id = c(1,1,1,2,2,3),
    income_year0 = c(NA,NA,100,NA,200,NA),
    income_year1 = c(NA, 105, NA, 202,NA, 255),
    income_year2 = c(102, NA,NA,NA,NA,NA)
)

I want to for each unique id to create a new column income that takes the value in income_year0 (if not NA), otherwise the value in income_year1 (if not NA), otherwise the value in income_year2, and if all are NA, then income is NA.

That is, I want one row per id with one income column like so:

data_want <- data.table(
    id = c(1,2,3),
    income = c(100,200,255)
)

Upvotes: 9

Views: 419

Answers (5)

Uwe
Uwe

Reputation: 42544

For the sake of completeness, here is a version which uses fcoalesce():

data[, .(income = do.call(fcoalesce, as.list(unlist(.SD)))), by = id]
   id income
1:  1    100
2:  2    200
3:  3    255

Upvotes: 1

zx8754
zx8754

Reputation: 56034

We can reshape wide-to-long, then get first non-na row:

melt(data, id.vars = "id", na.rm = TRUE)[, .(income = first(value)), id]
#    id income
# 1:  1    100
# 2:  2    200
# 3:  3    255

Upvotes: 5

Anoushiravan R
Anoushiravan R

Reputation: 21908

I know this answer has got nothing to do with data.table , however, since I would like to challenge myself to alternative solutions, here is another one that might be of interest to you:

library(dplyr)
library(tidyr)

data %>%
  pivot_longer(-id, values_to = "income") %>%
  drop_na() %>%
  arrange(id, name) %>%
  group_by(id) %>%
  slice_head(n = 1) %>%
  select(- name)

# A tibble: 3 x 2
# Groups:   id [3]
     id income
  <dbl>  <dbl>
1     1    100
2     2    200
3     3    255

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can unlist the columns and select the first non-NA value.

library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]

#   id income
#1:  1    100
#2:  2    200
#3:  3    255

Upvotes: 7

ThomasIsCoding
ThomasIsCoding

Reputation: 101129

Another option with as.matrix + is.na

> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
   id income
1:  1    100
2:  2    200
3:  3    255

Upvotes: 5

Related Questions