GabyLP
GabyLP

Reputation: 3781

r read data separator with 2 characters

I have a table (with more than 1 million rows) that is separated by '||' and inside one of the columns uses that symbol ('|'). I couldn't read it with fread from data.table because it only admits a length of 1 char, neither with read.table. An example of the lines would be:

 1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2

where the following was in only one field but it has the symbol '|':

 CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? - (xxxx) (CIUDAD)

The required split is:

field1= 1-Xxxxx
field2= 5804
field3= CONTROL REMOTO 5804/5834 - HONEYWELL
field4=31/5/2018 03:00:00
field5=CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? - () (CIUDAD EVITA )
field6= 2

Thanks!

Upvotes: 1

Views: 705

Answers (2)

Axeman
Axeman

Reputation: 35242

Read in your data in one column:

d <- data.table::fread(text = 
'1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2
 1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2', 
  sep = '', data.table = FALSE, header = FALSE) 

Separate the columns:

tidyr::separate(d, V1, into = paste('V', 1:6), sep = '\\|\\|', covert = TRUE)
      V 1  V 2                             V 3                V 4
1 1-Xxxxx 5804 CONTROL REMOTO 5804/5834 - xxxx 31/5/2018 03:00:00
2 1-Xxxxx 5804 CONTROL REMOTO 5804/5834 - xxxx 31/5/2018 03:00:00
                                                                                         V 5 V 6
1 CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)   2
2 CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)   2

Change paste('V', 1:6) to your actual column names.

(This approach is very similar to that of @CalumYou)

Upvotes: 0

Calum You
Calum You

Reputation: 15072

You can read in whole lines and then use regular expressions to split the lines. You have to add some column names to make it into a dataframe so I just used a-f, but replace with your own. You should also be able to replace lines with the path to your text file, here I'm just using literal data as an example.

library(tidyverse)
line <- " 1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE     EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2"
lines <- c(line, line)
line_list <- read_lines(lines) %>%
  str_split("\\|\\|") %>% 
  map(~set_names(., letters[1:6]))
bind_rows(!!!line_list)
#> # A tibble: 2 x 6
#>   a       b     c              d         e                            f    
#>   <chr>   <chr> <chr>          <chr>     <chr>                        <chr>
#> 1 " 1-Xx… 5804  CONTROL REMOT… 31/5/201… CALLE     EL QUIYA CASA 99,… 2    
#> 2 " 1-Xx… 5804  CONTROL REMOT… 31/5/201… CALLE     EL QUIYA CASA 99,… 2

Created on 2019-03-19 by the reprex package (v0.2.1)

Upvotes: 2

Related Questions