Reputation: 159
I'm trying to do someting with pivot_longer() to make a wide table long, but I Can't quite figure it out.
Here is the head of the dataframe I am trying to manipulate
head(stack)
unique.pair Area.IN Area.NEAR ALLEVEN.IN ALLEVEN.NEAR TREERICH.IN TREERICH.NEAR HEMIAB.IN HEMIAB.NEAR
1 AGFO 1_AGFO 5 100 100 0.7309552 0.3724176 2 1 1.00 0
2 AGFO 27_AGFO 24 100 100 0.8990520 0.6306221 1 0 1.00 0
3 AGFO 6_AGFO 23 100 100 0.7956735 0.7022392 1 1 1.00 0
4 ALFL LAMR.7_ALFL LAMR.103 100 400 0.4425270 0.6838157 4 6 0.50 0
5 APCO 10_APCO 2 400 400 0.5730378 0.5453876 18 19 0.55 0
6 APCO 4_APCO 9 400 400 0.6349441 0.7078960 22 23 0.55 0
Basically, every row is a unique pair of 2 IDs and their corresponding measurements of certain metrics (.IN and.NEAR); I now need to make it so I have two rows per each unique pair, and I split up their metrics .. for example, I was sort of successful in doing this for "ALLEVEN.IN and ALLEVEN.NEAR". I also need the AREA metrics
master.long <- master.JH %>%
select(unique.pair, ALLEVEN.IN, ALLEVEN.NEAR, HEMIAB.IN, HEMIAB.NEAR, Area.IN, Area.NEAR) %>%
pivot_longer(cols = c(ALLEVEN.IN, ALLEVEN.NEAR), names_to = "HEMI", values_to = "ALLEVEN") %>%
pivot_longer(cols = c(Area.IN, Area.NEAR), names_to = "Area", values_to = "Area_sampled") %>%
separate(HEMI, into = c(NA, "HEMI"))%>%
separate(Area, into = c(NA , "AREA")) %>%
mutate(HEMI.status = case_when(HEMI == "IN" & AREA == "IN" ~ "HEMI",
HEMI == "NEAR" & AREA =="NEAR" ~ "NO.HEMI"))
output is :
# A tibble: 6 x 8
unique.pair HEMIAB.IN HEMIAB.NEAR HEMI ALLEVEN AREA Area_sampled HEMI.status
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 AGFO 6_AGFO 23 1 0 IN 0.796 IN 100 HEMI
2 AGFO 6_AGFO 23 1 0 IN 0.796 NEAR 100 NA
3 AGFO 6_AGFO 23 1 0 NEAR 0.702 IN 100 NA
4 AGFO 6_AGFO 23 1 0 NEAR 0.702 NEAR 100 NO.HEMI
5 AGFO 27_AGFO 24 1 0 IN 0.899 IN 100 HEMI
6 AGFO 27_AGFO 24 1 0 IN 0.899 NEAR 100 NA
2 questions
1.) I see why there are NA's for HEMI.status, but I'm not sure how to tell the code to just drop those values. I can easily do it later, but was wondering if there is a way within pivot longer
2.) Is there any way to do this for all columns with one code of pivot longer for all of the column; I.e. could I incorporate "TREERICH.IN" and "TREERICH.NEAR" into this as well, with the same HEMI column? I tried, but when I say "names_to" = "HEMI" for TREERICH as well (see below) I get an obvious error
master.long <- master.JH %>%
select(unique.pair, ALLEVEN.IN, ALLEVEN.NEAR, HEMIAB.IN, HEMIAB.NEAR, Area.IN, Area.NEAR) %>%
pivot_longer(cols = c(ALLEVEN.IN, ALLEVEN.NEAR), names_to = "HEMI", values_to = "ALLEVEN") %>%
pivot_longer(cols = c(TREERICH.IN, TREERICH.NEAR), names_to = "HEMI", values_to = "TREERICH")
pivot_longer(cols = c(Area.IN, Area.NEAR), names_to = "Area", values_to = "Area_sampled") %>%
separate(HEMI, into = c(NA, "HEMI"))%>%
separate(Area, into = c(NA , "AREA")) %>%
mutate(HEMI.status = case_when(HEMI == "IN" & AREA == "IN" ~ "HEMI",
HEMI == "NEAR" & AREA =="NEAR" ~ "NO.HEMI"))
Hopefully I explained this well enough. Thanks for any help!
Upvotes: 1
Views: 73
Reputation: 886948
If we want to construct a long form from the original data, specify the names_sep
as .
(escape as it is by default in regex mode where .
represents any character) and names_to
with two components the .value
forms the individual columns and the "in_near" gets the suffix part of column name i.e. 'IN', 'NEAR'. If there are NA
elements specify the values_drop_na = TRUE
to remove those NA
s
library(dplyr)
library(tidyr)
master.JH %>%
pivot_longer(cols = -unique.pair,
names_to = c(".value", "in_near"),
names_sep = "\\.", values_drop_na = TRUE)
-output
# A tibble: 12 x 6
# unique.pair in_near Area ALLEVEN TREERICH HEMIAB
# <chr> <chr> <int> <dbl> <int> <dbl>
# 1 AGFO 1_AGFO 5 IN 100 0.731 2 1
# 2 AGFO 1_AGFO 5 NEAR 100 0.372 1 0
# 3 AGFO 27_AGFO 24 IN 100 0.899 1 1
# 4 AGFO 27_AGFO 24 NEAR 100 0.631 0 0
# 5 AGFO 6_AGFO 23 IN 100 0.796 1 1
# 6 AGFO 6_AGFO 23 NEAR 100 0.702 1 0
# 7 ALFL LAMR.7_ALFL LAMR.103 IN 100 0.443 4 0.5
# 8 ALFL LAMR.7_ALFL LAMR.103 NEAR 400 0.684 6 0
# 9 APCO 10_APCO 2 IN 400 0.573 18 0.55
#10 APCO 10_APCO 2 NEAR 400 0.545 19 0
#11 APCO 4_APCO 9 IN 400 0.635 22 0.55
#12 APCO 4_APCO 9 NEAR 400 0.708 23 0
master.JH <- structure(list(unique.pair = c("AGFO 1_AGFO 5", "AGFO 27_AGFO 24",
"AGFO 6_AGFO 23", "ALFL LAMR.7_ALFL LAMR.103", "APCO 10_APCO 2",
"APCO 4_APCO 9"), Area.IN = c(100L, 100L, 100L, 100L, 400L, 400L
), Area.NEAR = c(100L, 100L, 100L, 400L, 400L, 400L), ALLEVEN.IN = c(0.7309552,
0.899052, 0.7956735, 0.442527, 0.5730378, 0.6349441), ALLEVEN.NEAR = c(0.3724176,
0.6306221, 0.7022392, 0.6838157, 0.5453876, 0.707896), TREERICH.IN = c(2L,
1L, 1L, 4L, 18L, 22L), TREERICH.NEAR = c(1L, 0L, 1L, 6L, 19L,
23L), HEMIAB.IN = c(1, 1, 1, 0.5, 0.55, 0.55), HEMIAB.NEAR = c(0L,
0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
Upvotes: 4