Reputation: 311

Assign colClasses to certain columns in data frames with unknown length

I have a number of data files that I am reading into R as CSVs. I would like to specify the colClasses of certain columns in these data files, but the lengths of the dataframes are unknown as they contain species abundance data (hence, different numbers of species).

Is there a way that I can set, say, every column after the first 10 to numeric (so, ncol[10]:length(df)) using colClasses in read.csv?

This is what I tried, but to no avail:

df <- read.csv("file.csv", header=T, colClasses=c(ncols[10], rep("numeric", ncols)))

Any help would be greatly appreciated.

Thanks, Paul

Upvotes: 3

Answers (1)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

I would start with using count.fields to determine how many columns there are in the data. You can do this just on the first line.

Then, from there, you can use rep for your colClasses.

It's fugly, but works. Here's an example:

The first few lines are just to create a dummy csv file in your workspace since you didn't provide a reproducible example.

X <- tempfile()
cat("A,B,C,D,E,F",
    "1,2,3,4,5,6",
    "6,5,4,3,2,1", sep = "\n", file = X)

This is where the actual answer starts. Replace "x" with your actual file name in both places below. The -2 is because we have two columns that are already accounted for.

Y <- read.csv(X, colClasses = c(
  "numeric", "numeric", rep("character", count.fields(textConnection(
    readLines(X, n=1)), sep=",")-2)))

# Y <- read.csv("file.csv", colClasses = c(
#   "numeric", "numeric", rep(
#     "character", count.fields(readLines(
#       "file.csv", n = 1), sep = ",")-2)))

str(Y)
# 'data.frame':  2 obs. of  6 variables:
#  $ A: num  1 6
#  $ B: num  2 5
#  $ C: chr  "3" "4"
#  $ D: chr  "4" "3"
#  $ E: chr  "5" "2"
#  $ F: chr  "6" "1"

Upvotes: 1

Assign colClasses to certain columns in data frames with unknown length

Answers (1)

Related Questions