Devin
Devin

Reputation: 25

Issues with importing R Data due to formatting

I'm trying to import txt data into R; however, due to the txt file's unique formatting, I'm unsure of how to do this. I definitely feel that the issue is related to the fact that the txt file was formatted to line up columns with column names; however, as it's a text file, this was done with a variety of spaces. For example:

Gene           Chromosomal     Swiss-Prot             MIM    Description
name           position        AC        Entry name   code
______________ _______________ ______________________ ______ ______________________
A3GALT2       1p35.1          U3KPV4     A3LT2_HUMAN        Alpha-1,3-galactosyltransferase 2 (EC 2.4.1.87) (Isoglobotriaosylceramide synthase) (iGb3 synthase) (iGb3S) [A3GALT2P] [IGBS3S]
AADACL3       1p36.21         Q5VUY0     ADCL3_HUMAN        Arylacetamide deacetylase-like 3 (EC 3.1.1.-)
AADACL4       1p36.21         Q5VUY2     ADCL4_HUMAN        Arylacetamide deacetylase-like 4 (EC 3.1.1.-)
ABCA4         1p21-p22.1      P78363     ABCA4_HUMAN 601691 Retinal-specific phospholipid-transporting ATPase ABCA4 (EC 7.6.2.1) (ATP-binding cassette sub-family A member 4) (RIM ABC transporter) (RIM protein) (RmP) (Retinal-specific ATP-binding cassette transporter) (Stargardt disease protein) [ABCR]
ABCB10        1q42            Q9NRK6     ABCBA_HUMAN 605454 ATP-binding cassette sub-family B member 10, mitochondrial precursor (ATP-binding cassette transporter 

Because of this, I have not been able to import my data whatsoever. Because it was made to be justified text with spaces, the number of spaces aren't uniform at all.

This is the link to the data sheet that I am using: https://www.uniprot.org/docs/humchr01.txt

Upvotes: 0

Views: 67

Answers (1)

Aziz
Aziz

Reputation: 20765

Each field has a fixed width. Therefore, you can use the function read.fwf to read the file.

The following code reads the input file (assuming the file has only the rows, without the headers)

f = read.fwf('input.txt', c(14,16,11,12,7,250), strip.white=T)
colnames(f) = c('Gene name', 'Chromosomal position', 'Swiss-Prot AC',
                'Swiss-Prot Entry name', 'MIM code', 'Description')

Upvotes: 1

Related Questions