Reputation: 503

create variable based on multiple similar variables in R

My data looks like this (variables zipid1-zipid13 and variable hospid ranges from 1-13:

  zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1      0      0      0      0      1      0      0      0      0       0       0       0       0      5     0
2      0      0      1      0      1      0      0      0      0       0       0       0       0      5     0
3      0      0      0      0      0      0      1      0      0       0       0       0       0      5     0
4      0      0      1      0      0      0      0      0      0       0       0       0       0      5     0
5      0      0      1      0      1      0      0      0      0       0       0       0       0      5     0
6      0      0      0      0      1      0      0      0      0       0       0       0       0      5     0

How can I create a local variable = 1 when zipid1 ==1 & hospid =1, zipid2 == 1 & hospid == 2.etc. and else = 0 (ie, zipid = hospid)?

I tried ifelse but did not work well.

for (i in 1:13) {
name = paste0("zipid", i)
local$local <- with(local, ifelse(name == 1 & hospid == i, 1, 0))
}

Thanks!

Upvotes: 0

Answers (3)

Dan

Reputation: 12084

Here's a thought:

df$local <-  unlist(lapply(1:nrow(df), function(x)df[x, paste("zipid", df$hospid, sep = "")[x]]))

giving

#   zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
# 1      0      0      0      0      1      0      0      0      0       0       0       0       0      5     1
# 2      0      0      1      0      1      0      0      0      0       0       0       0       0      5     1
# 3      0      0      0      0      0      0      1      0      0       0       0       0       0      5     0
# 4      0      0      1      0      0      0      0      0      0       0       0       0       0      5     0
# 5      0      0      1      0      1      0      0      0      0       0       0       0       0      5     1
# 6      0      0      0      0      1      0      0      0      0       0       0       0       0      5     1

They way this works is that I take the value in each row of hospid then pastes it with zipid to make something like zipid5. I look up the value in that particular column corresponding to the particular row and check if it's 1.

If there are NAs in the data frame these can be removed with na.omit. For example, df <- na.omit(df) prior to running the code above.

Upvotes: 1

Uwe

Reputation: 42592

The issue is that the column names zipid1, zipid2, etc convey payload data, i.e., the number.

My suggestion is to reshape the data from wide to long form, extract the number from the column name, match it with hospid, aggregate it by id, and merge the result with the original wide format.

Aggregation is done using toString() so that we get a valid result in case of multiple matches.

library(data.table)
# reshape from wide to long format
melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
  # turn column names into integer
  , zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
    # if value is 1 and zipid and hospid do match then store number
    value == 1L & zipid == hospid, local := hospid][
      # aggregate only mathcing entries by id
      !is.na(local), .(local = toString(local)), by = id][
        # right join with original data
        DT, on = "id"][
          # change column order to meet OP's expectation
          , setcolorder(.SD, names(DT))]

   id zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1:  1      0      0      0      0      1      0      0      0      0       0       0       0       0      5     5
2:  2      0      0      1      0      1      0      0      0      0       0       0       0       0      5     5
3:  3      0      0      0      0      0      0      1      0      0       0       0       0       0      5  <NA>
4:  4      0      0      1      0      0      0      0      0      0       0       0       0       0      5  <NA>
5:  5      0      0      1      0      1      0      0      0      0       0       0       0       0      5     5
6:  6      0      0      0      0      1      0      0      0      0       0       0       0       0      5     5

Edit

By reshaping, the relevant information in DT can be condensed to

melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
  , zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
    value == 1L]

   id hospid zipid value
1:  2      5     3     1
2:  4      5     3     1
3:  5      5     3     1
4:  1      5     5     1
5:  2      5     5     1
6:  5      5     5     1
7:  6      5     5     1
8:  3      5     7     1

The result is given by

melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
  , zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
    value == 1L][
      zipid == hospid]

   id hospid zipid value
1:  1      5     5     1
2:  2      5     5     1
3:  5      5     5     1
4:  6      5     5     1

So, to combine this with the original data object we can do an update on join:

tmp <- 
  melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
    , zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
      value == 1L & zipid == hospid]
DT[tmp, on = "id", local := value][]

   id zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1:  1      0      0      0      0      1      0      0      0      0       0       0       0       0      5     1
2:  2      0      0      1      0      1      0      0      0      0       0       0       0       0      5     1
3:  3      0      0      0      0      0      0      1      0      0       0       0       0       0      5    NA
4:  4      0      0      1      0      0      0      0      0      0       0       0       0       0      5    NA
5:  5      0      0      1      0      1      0      0      0      0       0       0       0       0      5     1
6:  6      0      0      0      0      1      0      0      0      0       0       0       0       0      5     1

This gives the expected output. No aggregation required.

Data

library(data.table)
DT <- fread("id        zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1      0      0      0      0      1      0      0      0      0       0       0       0       0      5     0
2      0      0      1      0      1      0      0      0      0       0       0       0       0      5     0
3      0      0      0      0      0      0      1      0      0       0       0       0       0      5     0
4      0      0      1      0      0      0      0      0      0       0       0       0       0      5     0
5      0      0      1      0      1      0      0      0      0       0       0       0       0      5     0
6      0      0      0      0      1      0      0      0      0       0       0       0       0      5     0", drop = "local")

Upvotes: 1

Marco De Virgilis

Reputation: 1089

name is a vector of strings and in that context are interpreted as strings and not actually as variables, try as ifelse(get(name)==1 &...

Upvotes: -1

create variable based on multiple similar variables in R

Answers (3)

Edit

Data

Related Questions