Reputation: 83
I am trying to load in a set of data into R. It is a text document for a simple stats project.
flights<-read.table("flights.txt")
however when I do this I am getting the error "Error in read.table("FLIGHTS.txt") : duplicate 'row.names' are not allowed"
Here is a sample of what the text document looks like.
Flight Plane_ID Dep_Delay Taxi_Out Taxi_In_Arr_Delay
1 N338AA -2 30 12 -32
1 N329AA -1 19 13 -25
1 N319AA -2 12 8 -26
1 N319AA 2 19 21 -6
1 N329AA -2 18 17 5
1 N320AA 0 22 11 -15
I also added in the underscores for the names because I was getting an error pertaining to the number of elements in each line.
After putting in row.names = NULL I am getting this output
row.names Flight Plane_ID Dep_Delay Taxi_Out Taxi_In_Arr_Delay
1 1 N338AA -2 30 12 -32
2 1 N329AA -1 19 13 -25
3 1 N319AA -2 12 8 -26
4 1 N319AA 2 19 21 -6
There is an extra set of row numbers and it displays row.names, any way to get rid of this?
Upvotes: 0
Views: 2779
Reputation: 886948
data.frame
cannot take duplicate row names. We could use row.names=NULL
within the read.table
, which will create an extra column row.names
that can be removed by subsetting the dataset.
dat <- read.table('flights.txt', row.names=NULL)
dat <- dat[-1]
Another option would be to use awk
to replace the first column by ''
from line 2 onwards in the 'flights.txt', pipe
and read using read.table
dat1 <- read.table(pipe("awk 'NR >1{$1=\"\"}1' flights.txt"),
header=TRUE, stringsAsFactors=FALSE)
dat1
# Flight Plane_ID Dep_Delay Taxi_Out Taxi_In_Arr_Delay
#1 N338AA -2 30 12 -32
#2 N329AA -1 19 13 -25
#3 N319AA -2 12 8 -26
#4 N319AA 2 19 21 -6
#5 N329AA -2 18 17 5
#6 N320AA 0 22 11 -15
Upvotes: 1