Reputation: 113
I'm reading a file that has a structure like this:
[1111111]aaaa;bbbb;cccc
[2222222]dddd;ffff;gggg
And i want to have a data frame like this:
Column A Column B Column C Column D
1111111 aaaa bbbb cccc
2222222 dddd ffff gggg
So i need to split by ; and replace all the [ ]
So here is my code :
Read file
df<-read.csv("file.csv",sep=";")
Replace the [ ]
df_V1 <- gsub(pattern="[",replacement="",df$V1) #ERROR HERE!
df_V1 <- gsub(pattern="]",replacement=";",df$V1) #Replace the ] to ;
Then merge all together
df_V1 <- do.call(rbind.data.frame,strsplit(df_V1,split=";"))
Data<- cbind(
df_V1,
df[,c(2:ncol(df))])
And here is my output
View(Data)
Column A Column B Column C Column D
[1111111 aaaa bbbb cccc
[2222222 dddd ffff gggg
And dont know why the first [ cant be replaced, i already tried use gsub and delete the first character of the string, but nothing seems to solve it. Any idea?
Thanks for your time
Upvotes: 3
Views: 55
Reputation: 4378
If the columns truly are fixed in length, then read_fwf in library readr is useful.
library(readr)
read_fwf(
"[1111111]aaaa;bbbb;cccc
[2222222]dddd;ffff;gggg
", fwf_cols("Column A"=c(2,8), "Column B"=c(10,13), "Column C"=c(15,18), "column D"=c(20,23)))
# `Column A` `Column B` `column C` `Column D`
# <int> <chr> <chr> <chr>
# 1 1111111 aaaa bbbb cccc
# 2 2222222 dddd ffff gggg
Upvotes: 1
Reputation: 887118
We can read the data using readLines
first, do the string changes with gsub
and then read with read.csv
read.csv(text=sub(";", "", gsub("[][]", ";", lines)),
sep=";", header=FALSE, col.names = paste0("Column", LETTERS[1:4]), stringsAsFactors=FALSE)
# ColumnA ColumnB ColumnC ColumnD
#1 1111111 aaaa bbbb cccc
#2 2222222 dddd ffff gggg
lines <- readLines("file1.txt")
Upvotes: 3