Is there any way to read data by bytes length in R

Question

Is there any way to read data by bytes length in R like SAS input command? When some multi-bytes characters are in a table as fixed column length,

aaa대전11b1
bb 서울21b2
ccc부산갑b3

SAS can read it by bytes length as below.

data test;
infile "filepath";
input
V1 $3.
V2 $6.
V3 $2. ;
run;

→

aaa, 대전11, b1
bb , 서울21, b2
ccc, 부산갑, b3

But in R, read.fwf only can seperate data by widths not by bytes lengths.

So, command like below

test <- read.fwf("file path", widths=c(3,6,2))

outputs error, or at best shape like this

aaa, 대전11b1, NULL
bb , 서울21b2, NULL
ccc, 부산갑b3

So, this is my question: Is there any way to seperate data columns by bytes lengths in R?

Prem · Accepted Answer

With below code you should get the desired output (note: you can have this solution as a workaround till the time you find a better way to do it!)

file <- readLines("your_data_file.txt",encoding="UTF-8")
newTxt <- unlist(strsplit(file, split = "\u2028"))
newTxt <- lapply(newTxt, function(x) gsub("^([a-zA-Z]*)(.*)([a-zA-Z0-9]{2})$", "\1,\2,\3", x))
df = do.call(rbind.data.frame, newTxt)
names(df) <- "combined_column"

library(tidyr)
df %>% separate(combined_column, c("col1", "col2", "col3"), ",")

Output:

  col1    col2 col3
1  aaa  대전11   b1
2   bb  서울21   b2
3  ccc  부산갑   b3

Is there any way to read data by bytes length in R

Answers (1)

Related Questions