Reputation: 13
Is there any way to read data by bytes length in R like SAS input command? When some multi-bytes characters are in a table as fixed column length,
aaa대전11b1
bb 서울21b2
ccc부산갑b3
SAS can read it by bytes length as below.
data test;
infile "filepath";
input
V1 $3.
V2 $6.
V3 $2. ;
run;
→
aaa, 대전11, b1
bb , 서울21, b2
ccc, 부산갑, b3
But in R, read.fwf only can seperate data by widths not by bytes lengths.
So, command like below
test <- read.fwf("file path", widths=c(3,6,2))
outputs error, or at best shape like this
aaa, 대전11b1, NULL
bb , 서울21b2, NULL
ccc, 부산갑b3
So, this is my question: Is there any way to seperate data columns by bytes lengths in R?
Upvotes: 1
Views: 213
Reputation: 11985
With below code you should get the desired output (note: you can have this solution as a workaround till the time you find a better way to do it!)
file <- readLines("your_data_file.txt",encoding="UTF-8")
newTxt <- unlist(strsplit(file, split = "\u2028"))
newTxt <- lapply(newTxt, function(x) gsub("^([a-zA-Z]*)(.*)([a-zA-Z0-9]{2})$", "\\1,\\2,\\3", x))
df = do.call(rbind.data.frame, newTxt)
names(df) <- "combined_column"
library(tidyr)
df %>% separate(combined_column, c("col1", "col2", "col3"), ",")
Output:
col1 col2 col3
1 aaa 대전11 b1
2 bb 서울21 b2
3 ccc 부산갑 b3
Upvotes: 0