Reputation: 3781
I have a table (with more than 1 million rows) that is separated by '||' and inside one of the columns uses that symbol ('|'). I couldn't read it with fread from data.table because it only admits a length of 1 char, neither with read.table. An example of the lines would be:
1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2
where the following was in only one field but it has the symbol '|':
CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? - (xxxx) (CIUDAD)
The required split is:
field1= 1-Xxxxx
field2= 5804
field3= CONTROL REMOTO 5804/5834 - HONEYWELL
field4=31/5/2018 03:00:00
field5=CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? - () (CIUDAD EVITA )
field6= 2
Thanks!
Upvotes: 1
Views: 705
Reputation: 35242
Read in your data in one column:
d <- data.table::fread(text =
'1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2
1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2',
sep = '', data.table = FALSE, header = FALSE)
Separate the columns:
tidyr::separate(d, V1, into = paste('V', 1:6), sep = '\\|\\|', covert = TRUE)
V 1 V 2 V 3 V 4 1 1-Xxxxx 5804 CONTROL REMOTO 5804/5834 - xxxx 31/5/2018 03:00:00 2 1-Xxxxx 5804 CONTROL REMOTO 5804/5834 - xxxx 31/5/2018 03:00:00 V 5 V 6 1 CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD) 2 2 CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD) 2
Change paste('V', 1:6)
to your actual column names.
(This approach is very similar to that of @CalumYou)
Upvotes: 0
Reputation: 15072
You can read in whole lines and then use regular expressions to split the lines. You have to add some column names to make it into a dataframe so I just used a-f
, but replace with your own. You should also be able to replace lines
with the path to your text file, here I'm just using literal data as an example.
library(tidyverse)
line <- " 1-Xxxxx||5804||CONTROL REMOTO 5804/5834 - xxxx||31/5/2018 03:00:00||CALLE EL QUIYA CASA 99, MANZANA 99, - SECCION 8, CIRCUNSCRIPCION 4°|? -(xxxx) (CIUDAD)||2"
lines <- c(line, line)
line_list <- read_lines(lines) %>%
str_split("\\|\\|") %>%
map(~set_names(., letters[1:6]))
bind_rows(!!!line_list)
#> # A tibble: 2 x 6
#> a b c d e f
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 " 1-Xx… 5804 CONTROL REMOT… 31/5/201… CALLE EL QUIYA CASA 99,… 2
#> 2 " 1-Xx… 5804 CONTROL REMOT… 31/5/201… CALLE EL QUIYA CASA 99,… 2
Created on 2019-03-19 by the reprex package (v0.2.1)
Upvotes: 2