umair durrani
umair durrani

Reputation: 6155

How to add individual row number columns for different groups in data?

Data

I am using the palmer penguins data for this reproducible example. For a simple example, I am keeping the data for the year 2007 and the species Adelie:

install.packages("palmerpenguins")
library(palmerpenguins)
library(tidyverse)

ad <- penguins %>% 
  filter(species == "Adelie",
         year == 2007) %>% 
  select(species, island)

Goal and what I tried

I want to create individual columns for row numbers for each island in the data. There are 3 islands:

> unique(ad$island)
[1] Torgersen Biscoe    Dream    
Levels: Biscoe Dream Torgersen

So, there should be 3 row id columns.

I have done following that works:

ad %>% 
  add_count(island, name = "island_counts") %>% 
  group_by(island) %>% 
  mutate(row_id = 0:(unique(island_counts)-1),
         row_id_Torgersen =  if_else(island == "Torgersen", row_id, NA_integer_),
         row_id_Biscoe =  if_else(island == "Biscoe", row_id, NA_integer_),
         row_id_Dream =  if_else(island == "Dream" , row_id, NA_integer_)
    
  ) %>% 
  ungroup()  

# A tibble: 50 x 7
   species island    island_counts row_id row_id_Torgersen row_id_Biscoe row_id_Dream
   <fct>   <fct>             <int>  <int>            <int>         <int>        <int>
 1 Adelie  Torgersen            20      0                0            NA           NA
 2 Adelie  Torgersen            20      1                1            NA           NA
 3 Adelie  Torgersen            20      2                2            NA           NA
 4 Adelie  Torgersen            20      3                3            NA           NA
 5 Adelie  Torgersen            20      4                4            NA           NA
 6 Adelie  Torgersen            20      5                5            NA           NA
 7 Adelie  Torgersen            20      6                6            NA           NA
 8 Adelie  Torgersen            20      7                7            NA           NA
 9 Adelie  Torgersen            20      8                8            NA           NA
10 Adelie  Torgersen            20      9                9            NA           NA
# ... with 40 more rows  

But I want to do this programmatically, so that the number of new columns are not hard coded, but generated based on the number of islands. Is there a better solution? Thank you.

Upvotes: 0

Views: 26

Answers (1)

Duck
Duck

Reputation: 39595

Try this. You can duplicate the island variable and then reshape to wide using pivot_wider(). In this way you can avoid creating individual conditionals:

library(palmerpenguins)
library(tidyverse)

ad <- penguins %>% 
  filter(species == "Adelie",
         year == 2007) %>% 
  select(species, island)
#Code
ad %>% 
  add_count(island, name = "island_counts") %>% 
  group_by(island) %>% 
  mutate(row_id = 0:(unique(island_counts)-1),name=paste0('row_id_',island),
         id=row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = name,values_from=row_id)%>%
  select(-id)

Output:

# A tibble: 50 x 6
   species island    island_counts row_id_Torgersen row_id_Biscoe row_id_Dream
   <fct>   <fct>             <int>            <int>         <int>        <int>
 1 Adelie  Torgersen            20                0            NA           NA
 2 Adelie  Torgersen            20                1            NA           NA
 3 Adelie  Torgersen            20                2            NA           NA
 4 Adelie  Torgersen            20                3            NA           NA
 5 Adelie  Torgersen            20                4            NA           NA
 6 Adelie  Torgersen            20                5            NA           NA
 7 Adelie  Torgersen            20                6            NA           NA
 8 Adelie  Torgersen            20                7            NA           NA
 9 Adelie  Torgersen            20                8            NA           NA
10 Adelie  Torgersen            20                9            NA           NA
# ... with 40 more rows

Upvotes: 1

Related Questions