Reputation: 3321
I know data.table is able to tidy multiple columns at once, unlike dplyr, which relies on multiple gather
and spread
steps that can be difficult to visualize.
Here's a knotty problem for the tidyverse:
library(tidyverse)
df <- data_frame(month_1 = c("Jan", "Feb", "Mar", "Jun"),
score_1 = c(4, 5, 6, 4),
month_2 = c("Jan", "Mar", NA, NA),
score_2 = c(3, 2, NA, NA),
month_3 = c("Feb", "Mar", "Jun", NA),
score_3 = c(8, 7, 4, NA))
# A tibble: 4 x 6
month_1 score_1 month_2 score_2 month_3 score_3
<chr> <dbl> <chr> <dbl> <chr> <dbl>
1 Jan 4 Jan 3 Feb 8
2 Feb 5 Mar 2 Mar 7
3 Mar 6 NA NA Jun 4
4 Jun 4 NA NA NA NA
My desired result would be:
id month score
1 Jan 4
1 Feb 5
1 Mar 6
1 Jun 4
2 Jan 3
2 Mar 2
3 Feb 8
3 Mar 7
3 Jun 4
data.table users can solve this by melting with patterns, as:
melt(setDT(df), measure = patterns("^month", "^score"))
But since there's no equivalent dplyr function, I understand there will need to be several spread
. It seems like my solution below should work, but the second spread
goes awry:
df %>%
gather(key, value) %>%
mutate(id = parse_number(key),
key = str_replace(key, "_[0-9]", "")) %>%
spread(key, value )
Before marking this as a duplicate, please give it a try. Similar questions have the unique id in an existing column. This example has ids in the headers.
Upvotes: 2
Views: 168
Reputation: 21264
You can process month
and score
columns separately and then join them with purrr::map_dfc
:
map_dfc(c("month", "score"),
~ df %>%
select_at(vars(matches(.x))) %>%
gather(key, !!.x) %>%
separate(key, c("col", "id"), sep="_")) %>%
filter(complete.cases(.)) %>%
select(id, month, score)
# A tibble: 9 x 3
id month score
<chr> <chr> <chr>
1 1 Jan 4
2 1 Feb 5
3 1 Mar 6
4 1 Jun 4
5 2 Jan 3
6 2 Mar 2
7 3 Feb 8
8 3 Mar 7
9 3 Jun 4
Explanation:
map_dfc
iterates over the string values "month" and "score", referring to the current value as .x
. The dfc
suffix performs a cbind
on the iterated outputs. select_at
chooses only the columns that start with .x
gather
swings from wide to long, and names the values
column with the .x
string value. separate
splits key
into two columns, containing the column type (which corresponds to the .x
value) and the id
number. filter
out missing values and select
our target columns.Upvotes: 1