Reputation: 85
I was on that post read.csv and skip last column in R but did not find my answer, and try to check directly in Answer ... but that's not the right way (thanks mjuarez for taking the time to get me back on track.
The original question was:
I have read several other posts about how to import csv files with read.csv but skipping specific columns. However, all the examples I have found had very few columns, and so it was easy to do something like:
columnHeaders <- c("column1", "column2", "column_to_skip") columnClasses <- c("numeric", "numeric", "NULL") data <- read.csv(fileCSV, header = FALSE, sep = ",", col.names = columnHeaders, colClasses = columnClasses)
All answer were good, but does not work for what I entended to do. So I asked my self and other:
And in one function, does
data <- read_csv(fileCSV)[,(ncol(data)-1)]
could work?
I've tried in one line of R
to get on data
, all 5 of first 6 columns, so not the last one. To do so, I would like to use "-" in the number of column, do you think it's possible? How can I do that?
Thanks!
Upvotes: 5
Views: 8689
Reputation: 3905
Not a single function, but at least a single line, using dplyr
(disclaimer: I never use dplyr
or magrittr
, so a more optimized solution must exist using these libraries)
library(dplyr)
dat = read.table(fileCSV) %>% select(., which(names(.) != names(.)[ncol(.)]))
Upvotes: 1
Reputation: 270298
The right hand side of an assignment is processed first so this line from the question:
data <- read.csv(fileCSV)[,(ncol(data)-1)]
is trying to use data
before it is defined. Also note what the above is saying is to take only the 2nd last field. To get all but the last field:
data <- read.csv(fileCSV)
data <- data[-ncol(data)]
If you know the name of the last field, say it is lastField
, then this works and unlike the code above does not read the whole file and then remove the last field but rather only reads in fields other than the last. Also it is only one line of code.
read.csv(fileCSV, colClasses = c(lastField = "NULL"))
If you don't know the name of the last field but you do know how many fields there are, say n
, then either of these would work:
read.csv(fileCSV)[-n]
read.csv(fileCSV, colClasses = replace(rep(NA, n), n, "NULL"))
Another way to do it without first reading in the last field is to first read in the header and first line to calculate the number of fields (assuming that all records have the same number) and then re-read the file using that.
n <- ncol(read.csv(fileCSV, nrows = 1))
making use of one of the prior two statements involving n
.
Upvotes: 2
Reputation: 307
It's not possible in one line as the data
variable is not yet initialized when you call it. So the command ncol(data)
will trigger an error.
You would need to use two lines of code to first load your data into the data
variable and then remove the last column by either using data[,-ncol(data)]
or data[,1:(ncol(data)-1)]
.
Upvotes: 1
Reputation: 20095
In base r
it has to be 2 steps operation. Example:
> data <- read.csv("test12.csv")
> data
# 3 columns are returned
a b c
1 1/02/2015 1 3
2 2/03/2015 2 4
# last column is excluded
> data[,-ncol(data)]
a b
1 1/02/2015 1
2 2/03/2015 2
one cannot write data <- read.csv("test12.csv")[,-ncol(data)]
in base r
.
But if you know max number of columns in your csv
(say 3 in my case) then one can write:
df <- read.csv("test12.csv")[,-3]
df
a b
1 1/02/2015 1
2 2/03/2015 2
Upvotes: 4