Reputation: 353
I feel there must be a simple data.table
solution for this problem. I have the following data:
library(data.table)
data <- data.table(
id = c(1,1,1,2,2,3),
income_year0 = c(NA,NA,100,NA,200,NA),
income_year1 = c(NA, 105, NA, 202,NA, 255),
income_year2 = c(102, NA,NA,NA,NA,NA)
)
I want to for each unique id to create a new column income that takes the value in income_year0
(if not NA
), otherwise the value in income_year1
(if not NA), otherwise the value in income_year2
, and if all are NA, then income is NA.
That is, I want one row per id with one income column like so:
data_want <- data.table(
id = c(1,2,3),
income = c(100,200,255)
)
Upvotes: 9
Views: 419
Reputation: 42544
For the sake of completeness, here is a version which uses fcoalesce()
:
data[, .(income = do.call(fcoalesce, as.list(unlist(.SD)))), by = id]
id income 1: 1 100 2: 2 200 3: 3 255
Upvotes: 1
Reputation: 56034
We can reshape wide-to-long, then get first non-na row:
melt(data, id.vars = "id", na.rm = TRUE)[, .(income = first(value)), id]
# id income
# 1: 1 100
# 2: 2 200
# 3: 3 255
Upvotes: 5
Reputation: 21908
I know this answer has got nothing to do with data.table
, however, since I would like to challenge myself to alternative solutions, here is another one that might be of interest to you:
library(dplyr)
library(tidyr)
data %>%
pivot_longer(-id, values_to = "income") %>%
drop_na() %>%
arrange(id, name) %>%
group_by(id) %>%
slice_head(n = 1) %>%
select(- name)
# A tibble: 3 x 2
# Groups: id [3]
id income
<dbl> <dbl>
1 1 100
2 2 200
3 3 255
Upvotes: 1
Reputation: 388817
You can unlist the columns and select the first non-NA value.
library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]
# id income
#1: 1 100
#2: 2 200
#3: 3 255
Upvotes: 7
Reputation: 101129
Another option with as.matrix
+ is.na
> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
id income
1: 1 100
2: 2 200
3: 3 255
Upvotes: 5