Reputation: 3
I have several .txt files which need to be imported to R as dataframes for some data analysis. One of these files has no EOL in any form, so I'm left wondering how I would go about to import that.
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
This is how the first ~500 characters of that .txt file look like. The EOL would need to be placed like this:
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"
\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
Normally I would just gsub a "\n" to the places I need it to be, but there is no reoccurring string at the places where I would place a \n, so I don't think that gsub would work in this instance.
Seeing how the missing values are clearly indicated with NA, is there a function similar to read_delim that has a "col_number = x" argument? Like the first x values are the headers, the next x values are the values of the first row and so on and so forth?
If it changes anything, these .txt files are rather big (>300mb).
Big thank you to Julian_Hn. Works like a charm.
Upvotes: 0
Views: 236
Reputation: 2141
I would probably just read this in as a vector and then reformat as matrix with the number of columns you know are in the dataset. This essentially does what you want
str <- "\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\";\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA"
vec <- strsplit(str,";")[[1]]
//EDIT: add byrow = T To stay in the right format. Thanks Yuriy
table <- matrix(vec,ncol=23,nrow=3, byrow = T)
df <- as.data.frame(table)
Upvotes: 1