Reputation: 892
Perhaps an easy question, but I am a beginner and I could not find answers that apply to my case on stack overflow here.
I merged two datasets, which now look like this (the actual dataset has 5,000+ observations):
> data <- structure(list(Country = c("France", "France", "France", "Germany",
"Germany", "Germany"), Type_a = c("Type_a", "", "Type_a", "Type_a,Type_b",
"Type_b,Type_c,Type_f", "Type_f"), Type_b = c("", "Type_b", "Type_b",
"", "", ""), Type_c = c("", "", "Type_c", "", "", ""), Type_d = c("",
"Type_d", "", "", "", ""), Type_e = c("Type_e", "", "Type_e",
"", "", ""), Type_f = c("Type_f", "", "Type_f", "", "", "")), row.names = c(NA,
6L), class = "data.frame")
> View(data)
Please run View(data)
. Observations for France are in the right columns, but observations for Germany are all in one single column.
I want to split Germany data and assign them to the right columns, while taking into account missing data. That is, data cannot just be split and assigned to all next columns (as in the stack overflow link above), but only to select columns. Also, I do not want to create new columns.
How do I do this?
Thank you.
Upvotes: 1
Views: 43
Reputation: 34501
If this example is representative of your actual data you could possibly do something like:
library(tidyverse)
data %>%
rowid_to_column() %>%
separate_rows(Type_a, sep = ",") %>%
pivot_longer(-c(rowid, Country)) %>%
mutate(name = if_else(value == "", name, value)) %>%
pivot_wider(id_cols = c(rowid, Country), values_fn = list(value = ~paste0(.x, collapse = ""))) %>%
na_if("")
# A tibble: 6 x 8
rowid Country Type_a Type_b Type_c Type_d Type_e Type_f
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 France Type_a NA NA NA Type_e Type_f
2 2 France NA Type_b NA Type_d NA NA
3 3 France Type_a Type_b Type_c NA Type_e Type_f
4 4 Germany Type_a Type_b NA NA NA NA
5 5 Germany NA Type_b Type_c NA NA Type_f
6 6 Germany NA NA NA NA NA Type_f
Upvotes: 1