Fill the actual data with duplicated data and remove duplicated data

Question

I have the following data:-

library(data.table)
data <- data <- data.table(address = c("AA", "BB", "AA", "CC", "DD", "EE", "DD"),
                   revenue = c(NA, 121, 22, 33, 44, 33, NA),
                   castord = c(21, 22, NA, 3, NA, 223, 33),
                   versaze = c(NA, 22, 124, 33, NA, 44, 43))
data

#    address revenue castord versaze
# 1:      AA      NA      21      NA
# 2:      BB     121      22      22
# 3:      AA      22      NA     124
# 4:      CC      33       3      33
# 5:      DD      44      NA      NA
# 6:      EE      33     223      44
# 7:      DD      NA      33      43

Now this data has AA and DD addresses duplicated. And their first occurrence has some NA data as you can see in the row 1 and 5. What I want to do is fill this data by using the duplicated rows of these addresses. If the duplicated row has NA values, then NA shouldn't replace the values in the row of first occurrence. This will give me the following data:-

    data <- data.table(address = c("AA", "BB", "AA", "CC", "DD", "EE", "DD"),
                   revenue = c(22, 121, 22, 33, 44, 33, NA),
                   castord = c(21, 22, NA, 3, 33, 223, 33),
                   versaze = c(124, 22, 124, 33, 43, 44, 43))

#    address revenue castord versaze
# 1:      AA      22      21     124
# 2:      BB     121      22      22
# 3:      AA      22      NA     124
# 4:      CC      33       3      33
# 5:      DD      44      33      43
# 6:      EE      33     223      44
# 7:      DD      NA      33      43

And then remove those duplicated rows:-

data <- data.table(address = c("AA", "BB", "CC", "DD", "EE"),
                   revenue = c(22, 121, 33, 44, 33),
                   castord = c(21, 22, 3, 33, 223),
                   versaze = c(124, 22, 33, 43, 44))

#    address revenue castord versaze
# 1:      AA      22      21     124
# 2:      BB     121      22      22
# 3:      CC      33       3      33
# 4:      DD      44      33      43
# 5:      EE      33     223      44

zx8754 · Accepted Answer

Using data.table, group by address, loop through columns, remove NAs, get the first value:

data[, lapply(.SD, function(i) (na.omit(i)[ 1 ])), by = address]
#    address revenue castord versaze
# 1:      AA      22      21     124
# 2:      BB     121      22      22
# 3:      CC      33       3      33
# 4:      DD      44      33      43
# 5:      EE      33     223      44

Fill the actual data with duplicated data and remove duplicated data

Answers (2)

Related Questions