Split text in R and assign to select existing columns, while taking into account missing observations

Question

Perhaps an easy question, but I am a beginner and I could not find answers that apply to my case on stack overflow here.

I merged two datasets, which now look like this (the actual dataset has 5,000+ observations):

> data <- structure(list(Country = c("France", "France", "France", "Germany", 
"Germany", "Germany"), Type_a = c("Type_a", "", "Type_a", "Type_a,Type_b", 
"Type_b,Type_c,Type_f", "Type_f"), Type_b = c("", "Type_b", "Type_b", 
"", "", ""), Type_c = c("", "", "Type_c", "", "", ""), Type_d = c("", 
"Type_d", "", "", "", ""), Type_e = c("Type_e", "", "Type_e", 
"", "", ""), Type_f = c("Type_f", "", "Type_f", "", "", "")), row.names = c(NA, 
6L), class = "data.frame")
> View(data)

Please run View(data). Observations for France are in the right columns, but observations for Germany are all in one single column.

I want to split Germany data and assign them to the right columns, while taking into account missing data. That is, data cannot just be split and assigned to all next columns (as in the stack overflow link above), but only to select columns. Also, I do not want to create new columns.

How do I do this?

Thank you.

lroha · Accepted Answer

If this example is representative of your actual data you could possibly do something like:

library(tidyverse)

data %>%
  rowid_to_column() %>%
  separate_rows(Type_a, sep = ",") %>%
  pivot_longer(-c(rowid, Country)) %>%
  mutate(name = if_else(value == "", name, value)) %>%
  pivot_wider(id_cols = c(rowid, Country), values_fn = list(value = ~paste0(.x, collapse = ""))) %>%
  na_if("")

# A tibble: 6 x 8
  rowid Country Type_a Type_b Type_c Type_d Type_e Type_f
                 
1     1 France  Type_a NA     NA     NA     Type_e Type_f
2     2 France  NA     Type_b NA     Type_d NA     NA    
3     3 France  Type_a Type_b Type_c NA     Type_e Type_f
4     4 Germany Type_a Type_b NA     NA     NA     NA    
5     5 Germany NA     Type_b Type_c NA     NA     Type_f
6     6 Germany NA     NA     NA     NA     NA     Type_f

Split text in R and assign to select existing columns, while taking into account missing observations

Answers (1)

Related Questions