Danielle
Danielle

Reputation: 795

dplyr: group variables then assign unique names based on unique grouping

I have a dataframe like so:

df<- data.frame(date= c(rep("10-29-16", 3), rep("11-14-16", 2),
                      "12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2),"A","PlotA","PlotB"), 
                obs_network= c(rep("NA", 3), rep("NA", 2),"NA","PlotA","PlotB"))

For obs_network which are NA I want to give them a name for each unique date and loc combo. I would like the unique groups to be assigned a unique number and the prefix "pseudoplot" for this naming scheme. So the output would look like this:

output<- data.frame(date= c(rep("10-29-16", 3), rep("11-14-16", 2),
                      "12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2),"A","PlotA","PlotB"), 
                obs_network= c(rep("pseudoplot_1", 3),rep("pseudoplot_2", 2),"pseudoplot_3","PlotA","PlotB"))

I have tried the following without success and I cannot identify my error. Using the code below all the levels read "pseudoplot1". I would greatly appreciate it if someone explained why my code is not working in addition to providing a solution.

output<-
  df %>%
  group_by(date, loc)%>%
  mutate(obs_network=ifelse(is.na(obs_network), 
                      paste0("pseudoplot", "_", match(loc, unique (loc))), 
                             obs_network))

Upvotes: 1

Views: 805

Answers (2)

jazzurro
jazzurro

Reputation: 23574

This is something I could come up with. There are conditions: 1) date is a date object, and 2) loc and obs_network are character vectors. I create a sample example below. date is a date object, loc and obs_network are character vectors.

         date   loc obs_network
1  2016-10-29     A        <NA>
2  2016-10-29     A        <NA>
3  2016-10-29     A        <NA>
4  2016-11-14     B        <NA>
5  2016-11-14     B        <NA>
6  2016-12-29     A        <NA>
7  2017-10-02 PlotA       PlotA
8  2017-09-02 PlotB       PlotB
9  2017-10-10     A        <NA>
10 2017-10-10     B        <NA>

I used two things. One is that I used differences between two dates. The other is that I used the differences in order to create unique group numbers for unique dates with cumsum(). By pasting unique group numbers and loc, I created unique groups.

mydf %>%
mutate(obs_network = if_else(is.na(obs_network), 
                             paste0("pseudoplot_", cumsum(c(T, abs(diff(date)) > 0)), loc, sep = ""),
                             obs_network))


#         date   loc   obs_network
#1  2016-10-29     A pseudoplot_1A
#2  2016-10-29     A pseudoplot_1A
#3  2016-10-29     A pseudoplot_1A
#4  2016-11-14     B pseudoplot_2B
#5  2016-11-14     B pseudoplot_2B
#6  2016-12-29     A pseudoplot_3A
#7  2017-10-02 PlotA         PlotA
#8  2017-09-02 PlotB         PlotB
#9  2017-10-10     A pseudoplot_6A
#10 2017-10-10     B pseudoplot_6B

DATA

mydf <- structure(list(date = structure(c(17103, 17103, 17103, 17119, 
17119, 17164, 17441, 17411, 17449, 17449), class = "Date"), loc = c("A", 
"A", "A", "B", "B", "A", "PlotA", "PlotB", "A", "B"), obs_network = c(NA, 
NA, NA, NA, NA, NA, "PlotA", "PlotB", NA, NA)), .Names = c("date", 
"loc", "obs_network"), row.names = c(NA, -10L), class = "data.frame")

Upvotes: 1

B Williams
B Williams

Reputation: 2050

A few notes:

  1. You have included "NA" in your dataframe - so these are text (actually factors) not actually NA values. I recommend changing your original dataframe.

    df <- tibble(date= c(rep("10-29-16", 3), 
                             rep("11-14-16", 2),"12-29-16","10-2-17","9-2-17"),
                loc= c(rep("A", 3), rep("B", 2), "A", "PlotA", "PlotB"), 
                obs_network= c(rep(NA, 6), "PlotA", "PlotB"))
    
  2. There are going to be issues using factors (what you were creating in your database) and character vectors or integers using ifelse. I've change the dataset to a tibble so that everything is a character and am using if_else.

  3. Last don't use a group_by for this simply keep everything flat

    df %>% 
      mutate(obs_network = if_else(is.na(obs_network), 
                           paste0("pseudoplot", "_",  match(paste0(date,loc), unique(paste0(date,loc)))),
                           obs_network))
    

Upvotes: 0

Related Questions