user41509
user41509

Reputation: 1040

Quicker way to select different combinations of columns in R

My dataset is:

dput(new)
structure(list(Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("1982", "1983", "1985", "1989", "1994", 
"1995", "1998"), class = "factor"), scallopid = 11:20, Region = c("GB", 
"GB", "GB", "GB", "GB", "GB", "GB", "GB", "GB", "GB"), Area = structure(c(3L, 
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L), .Label = c("Nantucket Lightship", 
"NE GB", "SW GB"), class = "factor"), Station = c("1982288", 
"1982288", "1982288", "1982288", "1982288", "1982329", "1982329", 
"1982329", "1982329", "1982329"), Depth = c(68, 68, 68, 68, 68, 
94, 94, 94, 94, 94), Lat = c(40.833333, 40.833333, 40.833333, 
40.833333, 40.833333, 41.183333, 41.183333, 41.183333, 41.183333, 
41.183333), ring1 = c(79.31, 57.57, 75.13, 79.14, 71.5, 76.75, 
72.06, 59.98, 86.6, 67.7), ring2 = c(104.03, 100.81, 95.95, 109.95, 
105.65, 104.1, 98.19, 93.93, 105.31, 100.57), ring3 = c(124.58, 
122.71, 108.69, 122.14, 119.28, 128.48, 122.2, 110.86, 127.91, 
110.6), ring4 = c(132.44, 129.75, 116.96, NA, NA, 135.48, 128.28, 
119.62, 141.16, 124.22), ring5 = c(NA, NA, 123.42, NA, NA, 141.22, 
135.16, 129.49, 148.86, 132.73), ring6 = c(NA, NA, 129.24, NA, 
NA, 145.51, 140.31, 138.12, 152.15, 138.12), ring7 = c(NA, NA, 
134.44, NA, NA, 148.62, 145.08, 143.71, NA, 141.71), ring8 = c(NA, 
NA, 138.2, NA, NA, 152.3, 147.98, 145.43, NA, 144.9), ring9 = c(NA, 
NA, 140.81, NA, NA, 155.9, 150.36, NA, NA, 145.96), ring10 = c(NA, 
NA, 143.13, NA, NA, 158.5, NA, NA, NA, NA), ring11 = c(NA, NA, 
144.81, NA, NA, NA, NA, NA, NA, NA), ring12 = c(NA, NA, 147.39, 
NA, NA, NA, NA, NA, NA, NA), ring13 = c(NA, NA, 148.74, NA, NA, 
NA, NA, NA, NA, NA), ring14 = c(NA, NA, 149.05, NA, NA, NA, NA, 
NA, NA, NA), ring15 = c(NA, NA, 150.62, NA, NA, NA, NA, NA, NA, 
NA)), .Names = c("Year", "scallopid", "Region", "Area", "Station", 
"Depth", "Lat", "ring1", "ring2", "ring3", "ring4", "ring5", 
"ring6", "ring7", "ring8", "ring9", "ring10", "ring11", "ring12", 
"ring13", "ring14", "ring15"), row.names = 12:21, class = "data.frame")

I want to make a new dataset with the first 7 columns and then different combinations of the rest of the columns.

Long hand way:

#ring 1 and 2
gb1<-new[,c(1:9)]
colnames(gb1)[8]<-"ring1"
colnames(gb1)[9]<-"ring2"

#ring 2 and 3
gb2<-new[,c(1:7,9,10)]
colnames(gb2)[8]<-"ring1"
colnames(gb2)[9]<-"ring2"

#ring 3 and 4
gb3<-new[,c(1:7,10,11)]
colnames(gb3)[8]<-"ring1"
colnames(gb3)[9]<-"ring2"

etc. I do this for all columns and then rbind them to get them back into one dataframe.

Is there a streamlined way to do this?

The end result that I am looking for is:

  Year scallopid Region         Area Station Depth      Lat  ring1  ring2
2  1982         1    MAB VA/NC Border  198297    50 36.68333  78.56  95.45
21 1982         1    MAB VA/NC Border  198297    50 36.68333  95.45 109.49
22 1982         1    MAB VA/NC Border  198297    50 36.68333 109.49 117.20
23 1982         1    MAB VA/NC Border  198297    50 36.68333 117.20 125.86
24 1982         1    MAB VA/NC Border  198297    50 36.68333 125.86 130.75
25 1982         1    MAB VA/NC Border  198297    50 36.68333 130.75 133.32

For each scallopid, the initial ring1 value is the original ring1 value. The ring2 value is the next ring column value in the same row, so for the first record it would be ring . For the next row for the same scallopid: ring1 would be the value of the ring2 column and ring2 would be the value from the ring3 column and so on.

Upvotes: 0

Views: 69

Answers (2)

Parfait
Parfait

Reputation: 107652

Consider doing what you are showing manually in a loop by iterating sequentially through ring column names to build a list of dataframes that you can row bind at the end. The if logic within function is needed for very last iteration which will have an empty ring2.

id_cols <- colnames(new[1:7])
ring_cols <- colnames(new)[8:ncol(new)]

dfList <- lapply(seq_along(ring_cols), function(i) {
  if (is.na(ring_cols[i+1])) {
    tmp <- new[,c(id_cols, ring_cols[i])]
    tmp$ring2 <- NA
  } else {
    tmp <- new[,c(id_cols, ring_cols[i:(i+1)])]
  }

  colnames(tmp)[8:9] <- c("ring1", "ring2")
  return(tmp)
})

finaldf <- do.call(rbind, dfList)

Upvotes: 1

eipi10
eipi10

Reputation: 93851

Extending @joran's suggestion, we can label each ring into ring1 or ring2 after gathering, which can then be operated on by the new grouping:

library(tidyverse)

new.long = new %>% 
  gather(ring, value, ring1:ring15) %>% 
  group_by(ring) %>% 
  mutate(ring_group = ifelse(as.numeric(gsub("ring","", ring)) %% 2 == 1, "ring1", "ring2"))

UPDATE: Your updated example doesn't use the data you posted, and the data you posted contains only one row for each scallopid, so I'm not certain if the code below is on the right track. Let me know.

new.long = new %>% 
  select(Year:Lat, paste0("ring", seq(1,15,2)))
  gather(ring, ring1, ring1:ring15) %>% 
  group_by(scallopid, ring) %>% 
  mutate(ring2 = lead(ring1))

Upvotes: 1

Related Questions