Reputation: 25
I'm trying to import txt data into R; however, due to the txt file's unique formatting, I'm unsure of how to do this. I definitely feel that the issue is related to the fact that the txt file was formatted to line up columns with column names; however, as it's a text file, this was done with a variety of spaces. For example:
Gene Chromosomal Swiss-Prot MIM Description
name position AC Entry name code
______________ _______________ ______________________ ______ ______________________
A3GALT2 1p35.1 U3KPV4 A3LT2_HUMAN Alpha-1,3-galactosyltransferase 2 (EC 2.4.1.87) (Isoglobotriaosylceramide synthase) (iGb3 synthase) (iGb3S) [A3GALT2P] [IGBS3S]
AADACL3 1p36.21 Q5VUY0 ADCL3_HUMAN Arylacetamide deacetylase-like 3 (EC 3.1.1.-)
AADACL4 1p36.21 Q5VUY2 ADCL4_HUMAN Arylacetamide deacetylase-like 4 (EC 3.1.1.-)
ABCA4 1p21-p22.1 P78363 ABCA4_HUMAN 601691 Retinal-specific phospholipid-transporting ATPase ABCA4 (EC 7.6.2.1) (ATP-binding cassette sub-family A member 4) (RIM ABC transporter) (RIM protein) (RmP) (Retinal-specific ATP-binding cassette transporter) (Stargardt disease protein) [ABCR]
ABCB10 1q42 Q9NRK6 ABCBA_HUMAN 605454 ATP-binding cassette sub-family B member 10, mitochondrial precursor (ATP-binding cassette transporter
Because of this, I have not been able to import my data whatsoever. Because it was made to be justified text with spaces, the number of spaces aren't uniform at all.
This is the link to the data sheet that I am using: https://www.uniprot.org/docs/humchr01.txt
Upvotes: 0
Views: 67
Reputation: 20765
Each field has a fixed width. Therefore, you can use the function read.fwf
to read the file.
The following code reads the input file (assuming the file has only the rows, without the headers)
f = read.fwf('input.txt', c(14,16,11,12,7,250), strip.white=T)
colnames(f) = c('Gene name', 'Chromosomal position', 'Swiss-Prot AC',
'Swiss-Prot Entry name', 'MIM code', 'Description')
Upvotes: 1