FRANCIS
FRANCIS

Reputation: 31

Create new variables from disorganized data that has variables in multiple columns with pivot_wider

I have a data frame with the following structure:

library(tidyverse)

df <- tibble::tribble(
              ~var1,            ~var2,              ~var3, 
        "ano 2005",      "km 128000",         "marca chevrolet",
        "ano 2019",     "marca hyundai tucson",        "km 50000",
    "marca grand vitara sz",     "ano 2012",              "NA"
    )

I need to create new variables where the corresponding information is assigned for this use the following code

df %>% 
stack() %>% 
select(-ind) %>% 
separate(values, into = c("column", "value")) %>% 
pivot_wider(value, column, values_fn = list(value = list)) %>% 
unnest(cols = c(marca, ano, km))

but having NA I have the following error: No common size for marca, size 120, and km, size 119. There is also another error and it just returns the first word and the rest eliminates it. if anyone can help me I would really appreciate it

 ano   marca        km
2005   chevrolet  128000
2019   hyundai     50000
2012   grand        

Upvotes: 1

Views: 63

Answers (2)

qdread
qdread

Reputation: 3943

Here is a tidyverse solution. This requires the NA value in your data frame to be a NA_character_, not the quoted string "NA". The argument extra='merge' tells separate() not to drop additional words in the car model name beyond the first.

# replace quoted NA with true NA
df[df == 'NA'] <- NA_character_

 df %>%
  mutate(id = 1:nrow(df)) %>%
  pivot_longer(-id) %>%
  separate(value, into = c('column', 'value'), extra = 'merge') %>%
  select(-name) %>%
  filter(!is.na(column)) %>%
  pivot_wider(id_cols = id, names_from = column)

# A tibble: 3 x 4
#     id ano   km     marca          
#  <int> <chr> <chr>  <chr>          
#1     1 2005  128000 chevrolet      
#2     2 2019  50000  hyundai tucson 
#3     3 2012  NA     grand vitara sz

Upvotes: 0

Wimpel
Wimpel

Reputation: 27732

here is my data.table-approach

library( data.table )

#set to data.table format
setDT(df)
#create row_id's
df[, id := .I][]
#melt to long
ans <- melt( df, id.vars = "id" )
#split strings, using first space as separator
ans[, c("col_name", "col_value") := as.data.table( stringr::str_split_fixed( value, " ", 2 ) ) ]
#cast to wide
dcast( ans[!col_name == "NA",], id ~ col_name, value.var = "col_value")

#    id  ano     km           marca
# 1:  1 2005 128000       chevrolet
# 2:  2 2019  50000  hyundai tucson
# 3:  3 2012   <NA> grand vitara sz

Upvotes: 1

Related Questions