Reputation: 379
Need: Pivot from wide to long, stacking groups of corresponding columns.
In essence, I have 3 sets of 5 columns with a need for each of the corresponding columns to be stacked into 1 (i.e., the first variable in each of the 3 sets becomes 1 column, the second variable in each each is the 2nd column, etc). For instance, I need: columns #2, 7, & 12 to all be in 1 column, 3, 8, & 13 in the next column, ... 6, 11, & 16 all in 1 column.
Data structure: I have a dataset that resembles this:
df <- tibble(
pid = c(1, 2, 3, 4),
v1_1 = c(19, NA, NA, NA),
v1_2 = c(12, NA, NA, NA),
v2_1 = c(15, NA, NA, NA),
v2_2 = c(19, NA, NA, NA),
v1_entry_3 = c(11, NA, NA, NA),
v1_1_1 = c(NA, NA, 36, NA),
v1_2_1 = c(NA, NA, 35, NA),
v2_1_1 = c(NA, NA, 31, NA),
v2_2_1 = c(NA, NA, 39, NA),
v1_entry_3_1 = c(NA, NA, 33, NA),
v1_1_2 = c(NA, 26, NA, 41),
v1_2_2 = c(NA, 29, NA, 44),
v2_1_2 = c(NA, 21, NA, 42),
v2_2_2 = c(NA, 20, NA, 45),
v1_entry_3_2 = c(NA, 22, NA, 44),
age = c(19, 21, 33, 47)
)
In the end, I need data that look like this:
df_t <- tibble(
pid = c(1, 2, 3, 4),
v1_1 = c(19, 26, 36, 41),
v1_2 = c(12, 29, 35, 44),
v2_1 = c(15, 21, 31, 42),
v2_2 = c(19, 20, 39, 45),
v1_entry_3 = c(11, 22, 33, 44),
age = c(19, 21, 33, 47)
)
Upvotes: 2
Views: 748
Reputation: 42572
For the sake of completeness, here is an approach using the melt()
function:
library(data.table)
cols <- names(df)[2:6]
melt(setDT(df), measure = patterns(cols), value.name = cols, na.rm = TRUE)[order(pid)]
pid age variable v1_1 v1_2 v2_1 v2_2 v1_entry_3 1: 1 19 1 19 12 15 19 11 2: 2 21 3 26 29 21 20 22 3: 3 33 2 36 35 31 39 33 4: 4 47 3 41 44 42 45 44
Here, we benefit from the fact that the column names of the first set of columns to be reshaped can be reused as the column names of the reshaped output.
Upvotes: 2
Reputation: 21938
Here is a base R solution:
colnames <- startsWith(names(df), "v")
cbind(df[!colnames],
do.call(cbind, lapply(split.default(df[colnames], gsub("(v\\d_\\d|[[:alpha:]]+)_.*", "\\1", names(df)[colnames])),
function(x) apply(x, 1, \(x) x[!is.na(x)]))))
pid age v1_1 v1_2 v1_entry v2_1 v2_2
1 1 19 19 12 11 15 19
2 2 21 26 29 22 21 20
3 3 33 36 35 33 31 39
4 4 47 41 44 44 42 45
Upvotes: 4
Reputation: 34586
You need for column names to be matched on everything up to the second underscore:
library(tidyr)
df %>%
pivot_longer(
-c(pid, age),
names_pattern = "([^_]*_[^_]*)",
names_to = ".value",
values_drop_na = TRUE
)
# A tibble: 4 x 7
pid age v1_1 v1_2 v2_1 v2_2 v1_entry
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 19 19 12 15 19 11
2 2 21 26 29 21 20 22
3 3 33 36 35 31 39 33
4 4 47 41 44 42 45 44
Upvotes: 6
Reputation: 887711
Consider renaming some of the columns before doing the pivot_longer
library(dplyr)
library(stringr)
library(tidyr)
df %>%
rename_with(~ str_c(., '_0'), matches("^v\\d+_\\d+$|^v\\d+_entry_\\d+$")) %>%
pivot_longer(cols = -c(pid, age), names_to = c(".value"),
names_pattern = "(.*)_\\d+$", values_drop_na = TRUE)
# A tibble: 4 x 7
pid age v1_1 v1_2 v2_1 v2_2 v1_entry_3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 19 19 12 15 19 11
2 2 21 26 29 21 20 22
3 3 33 36 35 31 39 33
4 4 47 41 44 42 45 44
Upvotes: 4