user3154267
user3154267

Reputation: 41

Implementing code in 'R'?

Suppose I had the following data set.

Index-----Country------Age------Time-------Response
---------------------------------------------------
1------------------Germany-----------20-30----------15-20------------------1

2------------------Germany-----------20-30----------15-20------------------NA

3------------------Germany-----------20-30----------15-20------------------1

4------------------Germany-----------20-30----------15-20------------------0

5------------------France--------------20-30----------30-40------------------1

And I would like to fill in the NA based on the criteria listed below

  1. Find all exact matches of Country, Age and Time. ie. Index 1, 3 and 4
  2. Select at random 1 value from the Response column of these matching rows. ie 1,1 or 0
  3. Replace the NA with this new value

And I would like it to continue on in the same manner for the rest of the NA's in the data set.

I'm new to 'R' and can't figure out how to code this.

Upvotes: 0

Views: 84

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

Here is one approach using the "data.table" package:

DT <- data.table(mydf, key = "Country,Age,Time")
DT[, R2 := ifelse(is.na(Response), sample(na.omit(Response), 1), 
                  Response), by = key(DT)]
DT
#    Index Country   Age  Time Response R2
# 1:     5  France 20-30 30-40        1  1
# 2:     6  France 20-30 30-40       NA  2
# 3:     7  France 20-30 30-40        2  2
# 4:     1 Germany 20-30 15-20        1  1
# 5:     2 Germany 20-30 15-20       NA  1
# 6:     3 Germany 20-30 15-20        1  1
# 7:     4 Germany 20-30 15-20        0  0

Similarly, in base R, you could try ave:

within(mydf, {
  R2 <- ave(Response, Country, Age, Time, FUN = function(x) {
    ifelse(is.na(x), sample(na.omit(x), 1), x)
  })
})

Sorry, forgot to share the sample data I was working with:

mydf <- structure(list(Index = 1:7, Country = c("Germany", "Germany", 
"Germany", "Germany", "France", "France", "France"), Age = c("20-30", 
"20-30", "20-30", "20-30", "20-30", "20-30", "20-30"), Time = c("15-20", 
"15-20", "15-20", "15-20", "30-40", "30-40", "30-40"), Response = c(1L, 
NA, 1L, 0L, 1L, NA, 2L)), .Names = c("Index", "Country", "Age", 
"Time", "Response"), class = "data.frame", row.names = c(NA, -7L))

Upvotes: 2

Related Questions