Ruben Garcia
Ruben Garcia

Reputation: 21

Problems Importing txt file in R with readr instead of read.table

I'm trying to import the following text file:

   "year"   "sex"   "name"       "n"    "prop"
"1" 1880    "F"     "Mary"      7065    0.0723835869064085
"2" 1880    "F"     "Anna"      2604    0.0266789611187951
"3" 1880    "F"     "Emma"      2003    0.0205214896777829
"4" 1880    "F"     "Elizabeth" 1939    0.0198657855642641
"5" 1880    "F"     "Minnie"    1746    0.0178884278469341
"6" 1880    "F"     "Margaret"  1578    0.0161672045489473
"7" 1880    "F"     "Ida"       1472    0.0150811946109318
"8" 1880    "F"     "Alice"     1414    0.0144869627580554
"9" 1880    "F"     "Bertha"    1320    0.0135238973413247
"10"1880    "F"     "Sarah"     1288    0.0131960452845653

and I don't have any problems using:

data <-read.table("~/Documents/baby_names.txt",header=TRUE,se="\t")

However, I haven't figured out how to do it with readr. The following command fails:

data2 <-read_tsv("~/Documents/baby_names.txt")

I know the problem is related to the fact that the first row contains five elements (the headings) and the rest 6 but I don't know how to tell readr to ignore the "1", "2", "3" and so on. Any suggestions?

Upvotes: 2

Views: 1017

Answers (2)

niczky12
niczky12

Reputation: 5063

You can read in the body and the column names separately and then combine them:

require(readr)

df <- read_tsv("baby_names.txt", col_names = F, skip = 1)

col_names <- read.table("baby_names.txt", header = F, sep = "\t", nrows = 1)

df$X1 <- NULL
names(df) <- col_names

Result:

> head(df)
     1     1         1    1          1
1 1880 FALSE      Mary 7065 0.07238359
2 1880 FALSE      Anna 2604 0.02667896
3 1880 FALSE      Emma 2003 0.02052149
4 1880 FALSE Elizabeth 1939 0.01986579
5 1880 FALSE    Minnie 1746 0.01788843
6 1880 FALSE  Margaret 1578 0.01616720

I don't think there is an easy way of setting row_names in read_tsv() as there is with read.table(), but this should be sufficient workaround.

Upvotes: 0

zx8754
zx8754

Reputation: 56149

We can read in two steps (not tested):

# read the columns, convert to character vector
myNames <- read_tsv(file = "myFile.tsv", n_max = 1)[1, ]

# read the data, skip 1st row, then drop the 1st column
myData <- read_tsv(file = "myFile.tsv", skip = 1, col_names = FALSE)[, -1]

# assign column names
colnames(myData) <- myNames

Upvotes: 1

Related Questions