Tom
Tom

Reputation: 2351

Using a conditional in a for loop to create a unique panel id

I have a dataset which looks as follows:

# A tibble: 5,458 x 539
# Groups:   country, id1 [2,729]
   idstd   id2    xxx        id1         country  year 
   <dbl+>   <dbl> <dbl+lbl> <dbl+lbl>   <chr>   <dbl>
 1 445801   NA      NA       7          Albania  2009 
 2 542384 4616555 1163       7          Albania  2013 
 3 445802   NA      NA       8          Albania  2009 
 4 542386 4616355 1162       8          Albania  2013 
 5 445803   NA      NA      25          Albania  2009 
 6 542371 4616545 1161      25          Albania  2013 
 7 445804   NA      NA      30          Albania  2009 
 8 542152 4616556  475      30          Albania  2013
 9 445805   NA      NA      31          Albania  2009 
10 542392 4616542 1160      31          Albania  2013 

The data is paneldata, but is there is no unique panel-id. The first two observations are for example respondent number 7 from Albania, but number 7 is used again for other countries. id2 however is unique. My plan is therefore to copy id2 into the NA entry of the corresponding respondent.

I wrote the following code:

for (i in 1:nrow(df)) {
if (df$id1[i]== df$id1[i+1] & df$country[i] == df$country[i+1]) {
df$id2[i] <- df$id2[i+1]
}}

Which gives the following error:

Error in if (df$id1[i] == df1$id1[i + 1] &  :  missing value where TRUE/FALSE needed

It does however seem to work. As my dataset is quite large and I am not very skilled, I am reluctant to accept the solution I came up with, especially when it gives an error.

Could anyone may help explain the error to me?

In addition, is there a more efficient (for example data.table) and maybe error free way to deal with this?

Upvotes: 0

Views: 53

Answers (1)

deann
deann

Reputation: 796

Can you not do something along the line:

library(tidyverse)
df %>%
    group_by(country, id1) %>%
    mutate(uniqueId = id2 %>% discard(is.na) %>% unique) %>%
    ungroup()

Also, from looking at your loop I judge that the NA are always 1 row apart from the unique IDs, so you could also do:

df %>%
    mutate(id2Lag = lag(id2),
           uniqueId = ifelse(is.na(id2), id2Lag, id2) %>%
    select(-id2Lag)

Upvotes: 1

Related Questions