Reputation: 3
I am trying a nested ifelse statement within a for loop to create a new variable, the values of which are based on the frequency of occurrence of a factor variable (a list of postcodes).
The new variable should return a predefined series of numbers based on the frequency of a postcode (frequencies range between 1 and 4). Each of these number series must end in 800 and increase in increments of 200, the starting point of which depends on the frequency of each postcode: the higher the frequency, the lower the starting increment of 200.
For this I have defined a for loop, in which I first measure the frequency of each postcode, followed by a nested ifelse statement, specifying each series of numbers to be allocated to the NewVar based on the frequency.
A small intuitive example of what I want to achieve is written here, I want to apply this on a dataframe containing millions of postcodes.
DESIRED RESULT:
Postcode NewVar
AA 600
AA 800
BB 400
BB 600
BB 800
CC 800
DD 200
DD 400
DD 600
DD 800
CODE:
DF$NewVar <- 0
DF$NewVar <- for (i in levels(DF$Postcode[i]))
ifelse((table(DF$Postcode[i]) == 4), DF$NewVar[i] <- c(200,400,600,800),
(ifelse ((table(DF$Postcode[i]) == 3), DF$NewVar[i] <- c(400,600,800),
(ifelse ((table(DF$Postcode[i]) == 2), DF$NewVar[i] <- c(600,800),
DF$NewVar[i] <- c(800))))))
PROBLEM 1:
Firstly, when running the entire code, I receive an error stating that there is a mismatch between the amount of rows in the replacement versus the data, whilst when manually checking for this, it is not the case (the mismatch is always limited to exactly 1 row).
Error in `$<-.data.frame`(`*tmp*`, NewVar, value = c("0", "0", "0", :
replacement has 11 rows, data has 10.
PROBLEM 2:
TESTING IF AN IFELSE WORKS ON ITS OWN (OUT OF THE LOOP):
When verifying if the ifelse clause works on its own (outside of the loop), I see that only the starting increment of 200 is copied on each line of NewVar, so it does not increment to 800. This is not what I want to achieve either:
CODE TESTING ONE IFELSE:
DF$NewVar[1:2] <- ifelse((sum(table(DF$Postcode)) == 2),
DF$NewVar[1:2] <- c(600,800), "NA")
RESULT (not desired):
Postcode NewVar
AA 200
AA 200
DESIRED RESULT:
Postcode NewVar
AA 200
AA 400
Note: I predefined the NewVar column before trying to allocated the variable, and I have checked for NA´s already as well.
Thank you in advance for your time.
Upvotes: 0
Views: 166
Reputation: 42592
For the sake of completeness, here is a base R solution which uses the ave()
function.
Let's assume Postcode
is a vector of postcodes in random order:
Postcode
[1] "BB" "CC" "CC" "BB" "BB" "AA" "CC" "BB" "AA" "DD"
the code below creates a data.frame including Postcode
and NewVar
:
data.frame(
Postcode,
NewVar = ave(Postcode, Postcode,
FUN = function(x) seq(to = 800, by = 200, length.out = length(x)))
)
Postcode NewVar 1 BB 200 2 CC 400 3 CC 600 4 BB 400 5 BB 600 6 AA 600 7 CC 800 8 BB 800 9 AA 800 10 DD 800
# create data
library(magrittr) # only used to improve readability
n_codes <- 4L
set.seed(1L)
Postcode <-
stringr::str_dup(LETTERS[1:n_codes], 2L) %>% # create codes
rep(times = sample(n_codes)) %>% # replicate randomly
sample() # re-order randomly
Upvotes: 0
Reputation: 173707
One way if you're willing to use dplyr:
library(dplyr)
DF <- structure(list(Postcode = c("AA", "AA", "BB", "BB", "BB", "CC",
"DD", "DD", "DD", "DD")), class = "data.frame", row.names = c(NA,
-10L))
vals <- c(200,400,600,800)
DF %>% group_by(Postcode) %>% mutate(NewVar = tail(vals,n()))
Upvotes: 1