Reputation: 503
My data looks like this (variables zipid1-zipid13
and variable hospid
ranges from 1-13:
zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1 0 0 0 0 1 0 0 0 0 0 0 0 0 5 0
2 0 0 1 0 1 0 0 0 0 0 0 0 0 5 0
3 0 0 0 0 0 0 1 0 0 0 0 0 0 5 0
4 0 0 1 0 0 0 0 0 0 0 0 0 0 5 0
5 0 0 1 0 1 0 0 0 0 0 0 0 0 5 0
6 0 0 0 0 1 0 0 0 0 0 0 0 0 5 0
How can I create a local variable = 1 when zipid1 ==1 & hospid =1, zipid2 == 1 & hospid == 2
.etc. and else = 0 (ie, zipid = hospid)?
I tried ifelse
but did not work well.
for (i in 1:13) {
name = paste0("zipid", i)
local$local <- with(local, ifelse(name == 1 & hospid == i, 1, 0))
}
Thanks!
Upvotes: 0
Views: 80
Reputation: 12084
Here's a thought:
df$local <- unlist(lapply(1:nrow(df), function(x)df[x, paste("zipid", df$hospid, sep = "")[x]]))
giving
# zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
# 1 0 0 0 0 1 0 0 0 0 0 0 0 0 5 1
# 2 0 0 1 0 1 0 0 0 0 0 0 0 0 5 1
# 3 0 0 0 0 0 0 1 0 0 0 0 0 0 5 0
# 4 0 0 1 0 0 0 0 0 0 0 0 0 0 5 0
# 5 0 0 1 0 1 0 0 0 0 0 0 0 0 5 1
# 6 0 0 0 0 1 0 0 0 0 0 0 0 0 5 1
They way this works is that I take the value in each row of hospid
then pastes it with zipid
to make something like zipid5
. I look up the value in that particular column corresponding to the particular row and check if it's 1
.
If there are NA
s in the data frame these can be removed with na.omit
. For example, df <- na.omit(df)
prior to running the code above.
Upvotes: 1
Reputation: 42592
The issue is that the column names zipid1
, zipid2
, etc convey payload data, i.e., the number.
My suggestion is to reshape the data from wide to long form, extract the number from the column name, match it with hospid
, aggregate it by id
, and merge the result with the original wide format.
Aggregation is done using toString()
so that we get a valid result in case of multiple matches.
library(data.table)
# reshape from wide to long format
melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
# turn column names into integer
, zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
# if value is 1 and zipid and hospid do match then store number
value == 1L & zipid == hospid, local := hospid][
# aggregate only mathcing entries by id
!is.na(local), .(local = toString(local)), by = id][
# right join with original data
DT, on = "id"][
# change column order to meet OP's expectation
, setcolorder(.SD, names(DT))]
id zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local 1: 1 0 0 0 0 1 0 0 0 0 0 0 0 0 5 5 2: 2 0 0 1 0 1 0 0 0 0 0 0 0 0 5 5 3: 3 0 0 0 0 0 0 1 0 0 0 0 0 0 5 <NA> 4: 4 0 0 1 0 0 0 0 0 0 0 0 0 0 5 <NA> 5: 5 0 0 1 0 1 0 0 0 0 0 0 0 0 5 5 6: 6 0 0 0 0 1 0 0 0 0 0 0 0 0 5 5
By reshaping, the relevant information in DT
can be condensed to
melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
, zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
value == 1L]
id hospid zipid value 1: 2 5 3 1 2: 4 5 3 1 3: 5 5 3 1 4: 1 5 5 1 5: 2 5 5 1 6: 5 5 5 1 7: 6 5 5 1 8: 3 5 7 1
The result is given by
melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
, zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
value == 1L][
zipid == hospid]
id hospid zipid value 1: 1 5 5 1 2: 2 5 5 1 3: 5 5 5 1 4: 6 5 5 1
So, to combine this with the original data object we can do an update on join:
tmp <-
melt(setDT(DT), id.vars = c("id", "hospid"), variable.name = "zipid")[
, zipid := as.integer(stringr::str_replace(zipid, "zipid", ""))][
value == 1L & zipid == hospid]
DT[tmp, on = "id", local := value][]
id zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local 1: 1 0 0 0 0 1 0 0 0 0 0 0 0 0 5 1 2: 2 0 0 1 0 1 0 0 0 0 0 0 0 0 5 1 3: 3 0 0 0 0 0 0 1 0 0 0 0 0 0 5 NA 4: 4 0 0 1 0 0 0 0 0 0 0 0 0 0 5 NA 5: 5 0 0 1 0 1 0 0 0 0 0 0 0 0 5 1 6: 6 0 0 0 0 1 0 0 0 0 0 0 0 0 5 1
This gives the expected output. No aggregation required.
library(data.table)
DT <- fread("id zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13 hospid local
1 0 0 0 0 1 0 0 0 0 0 0 0 0 5 0
2 0 0 1 0 1 0 0 0 0 0 0 0 0 5 0
3 0 0 0 0 0 0 1 0 0 0 0 0 0 5 0
4 0 0 1 0 0 0 0 0 0 0 0 0 0 5 0
5 0 0 1 0 1 0 0 0 0 0 0 0 0 5 0
6 0 0 0 0 1 0 0 0 0 0 0 0 0 5 0", drop = "local")
Upvotes: 1
Reputation: 1089
name is a vector of strings and in that context are interpreted as strings and not actually as variables, try as ifelse(get(name)==1 &...
Upvotes: -1