Skotani1
Skotani1

Reputation: 159

difficulty using pivot_longer()

I'm trying to do someting with pivot_longer() to make a wide table long, but I Can't quite figure it out.

Here is the head of the dataframe I am trying to manipulate

head(stack)
                unique.pair Area.IN Area.NEAR ALLEVEN.IN ALLEVEN.NEAR TREERICH.IN TREERICH.NEAR HEMIAB.IN HEMIAB.NEAR
1             AGFO 1_AGFO 5     100       100  0.7309552    0.3724176           2             1      1.00           0
2           AGFO 27_AGFO 24     100       100  0.8990520    0.6306221           1             0      1.00           0
3            AGFO 6_AGFO 23     100       100  0.7956735    0.7022392           1             1      1.00           0
4 ALFL LAMR.7_ALFL LAMR.103     100       400  0.4425270    0.6838157           4             6      0.50           0
5            APCO 10_APCO 2     400       400  0.5730378    0.5453876          18            19      0.55           0
6             APCO 4_APCO 9     400       400  0.6349441    0.7078960          22            23      0.55           0

Basically, every row is a unique pair of 2 IDs and their corresponding measurements of certain metrics (.IN and.NEAR); I now need to make it so I have two rows per each unique pair, and I split up their metrics .. for example, I was sort of successful in doing this for "ALLEVEN.IN and ALLEVEN.NEAR". I also need the AREA metrics

master.long <- master.JH %>%
  select(unique.pair, ALLEVEN.IN, ALLEVEN.NEAR, HEMIAB.IN, HEMIAB.NEAR, Area.IN, Area.NEAR) %>%
  pivot_longer(cols = c(ALLEVEN.IN, ALLEVEN.NEAR), names_to = "HEMI", values_to = "ALLEVEN") %>%
  pivot_longer(cols = c(Area.IN, Area.NEAR), names_to = "Area", values_to = "Area_sampled") %>% 
  separate(HEMI, into = c(NA, "HEMI"))%>%
  separate(Area, into = c(NA , "AREA")) %>%
  mutate(HEMI.status = case_when(HEMI == "IN" & AREA == "IN" ~ "HEMI",
                                 HEMI == "NEAR" & AREA =="NEAR" ~ "NO.HEMI"))

output is :

# A tibble: 6 x 8
  unique.pair     HEMIAB.IN HEMIAB.NEAR HEMI  ALLEVEN AREA  Area_sampled HEMI.status
  <chr>               <dbl>       <dbl> <chr>   <dbl> <chr>        <dbl> <chr>      
1 AGFO 6_AGFO 23          1           0 IN      0.796 IN             100 HEMI       
2 AGFO 6_AGFO 23          1           0 IN      0.796 NEAR           100 NA         
3 AGFO 6_AGFO 23          1           0 NEAR    0.702 IN             100 NA         
4 AGFO 6_AGFO 23          1           0 NEAR    0.702 NEAR           100 NO.HEMI    
5 AGFO 27_AGFO 24         1           0 IN      0.899 IN             100 HEMI       
6 AGFO 27_AGFO 24         1           0 IN      0.899 NEAR           100 NA   

2 questions

1.) I see why there are NA's for HEMI.status, but I'm not sure how to tell the code to just drop those values. I can easily do it later, but was wondering if there is a way within pivot longer

2.) Is there any way to do this for all columns with one code of pivot longer for all of the column; I.e. could I incorporate "TREERICH.IN" and "TREERICH.NEAR" into this as well, with the same HEMI column? I tried, but when I say "names_to" = "HEMI" for TREERICH as well (see below) I get an obvious error

master.long <- master.JH %>%
  select(unique.pair, ALLEVEN.IN, ALLEVEN.NEAR, HEMIAB.IN, HEMIAB.NEAR, Area.IN, Area.NEAR) %>%
  pivot_longer(cols = c(ALLEVEN.IN, ALLEVEN.NEAR), names_to = "HEMI", values_to = "ALLEVEN") %>%
pivot_longer(cols = c(TREERICH.IN, TREERICH.NEAR), names_to = "HEMI", values_to = "TREERICH")
  pivot_longer(cols = c(Area.IN, Area.NEAR), names_to = "Area", values_to = "Area_sampled") %>% 
  separate(HEMI, into = c(NA, "HEMI"))%>%
  separate(Area, into = c(NA , "AREA")) %>%
  mutate(HEMI.status = case_when(HEMI == "IN" & AREA == "IN" ~ "HEMI",
                                 HEMI == "NEAR" & AREA =="NEAR" ~ "NO.HEMI"))

Hopefully I explained this well enough. Thanks for any help!

Upvotes: 1

Views: 73

Answers (1)

akrun
akrun

Reputation: 886948

If we want to construct a long form from the original data, specify the names_sep as . (escape as it is by default in regex mode where . represents any character) and names_to with two components the .value forms the individual columns and the "in_near" gets the suffix part of column name i.e. 'IN', 'NEAR'. If there are NA elements specify the values_drop_na = TRUE to remove those NAs

library(dplyr)
library(tidyr)
master.JH %>% 
     pivot_longer(cols = -unique.pair, 
        names_to = c(".value", "in_near"), 
           names_sep = "\\.", values_drop_na = TRUE)

-output

# A tibble: 12 x 6
#   unique.pair               in_near  Area ALLEVEN TREERICH HEMIAB
#   <chr>                     <chr>   <int>   <dbl>    <int>  <dbl>
# 1 AGFO 1_AGFO 5             IN        100   0.731        2   1   
# 2 AGFO 1_AGFO 5             NEAR      100   0.372        1   0   
# 3 AGFO 27_AGFO 24           IN        100   0.899        1   1   
# 4 AGFO 27_AGFO 24           NEAR      100   0.631        0   0   
# 5 AGFO 6_AGFO 23            IN        100   0.796        1   1   
# 6 AGFO 6_AGFO 23            NEAR      100   0.702        1   0   
# 7 ALFL LAMR.7_ALFL LAMR.103 IN        100   0.443        4   0.5 
# 8 ALFL LAMR.7_ALFL LAMR.103 NEAR      400   0.684        6   0   
# 9 APCO 10_APCO 2            IN        400   0.573       18   0.55
#10 APCO 10_APCO 2            NEAR      400   0.545       19   0   
#11 APCO 4_APCO 9             IN        400   0.635       22   0.55
#12 APCO 4_APCO 9             NEAR      400   0.708       23   0        

data

master.JH <- structure(list(unique.pair = c("AGFO 1_AGFO 5", "AGFO 27_AGFO 24", 
"AGFO 6_AGFO 23", "ALFL LAMR.7_ALFL LAMR.103", "APCO 10_APCO 2", 
"APCO 4_APCO 9"), Area.IN = c(100L, 100L, 100L, 100L, 400L, 400L
), Area.NEAR = c(100L, 100L, 100L, 400L, 400L, 400L), ALLEVEN.IN = c(0.7309552, 
0.899052, 0.7956735, 0.442527, 0.5730378, 0.6349441), ALLEVEN.NEAR = c(0.3724176, 
0.6306221, 0.7022392, 0.6838157, 0.5453876, 0.707896), TREERICH.IN = c(2L, 
1L, 1L, 4L, 18L, 22L), TREERICH.NEAR = c(1L, 0L, 1L, 6L, 19L, 
23L), HEMIAB.IN = c(1, 1, 1, 0.5, 0.55, 0.55), HEMIAB.NEAR = c(0L, 
0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Upvotes: 4

Related Questions