xTernal
xTernal

Reputation: 327

Cannot subset data frame after reading with fread()

I am trying to subset a table named cars, as shown below. I do not want the Country column in my subtable so I used [,-1] to delete the first column, but instead it assigned my new variable cars.use to -1. What happened here?

> library(data.table)
> cars <- fread('cars.csv', header = TRUE)
> typeof(cars)
[1] "list"
> head(cars)
Country                       Car  MPG Weight Drive_Ratio Horsepower Displacement Cylinders
1:    U.S.        Buick Estate Wagon 16.9  4.360        2.73        155          350         8
2:    U.S. Ford Country Squire Wagon 15.5  4.054        2.26        142          351         8
3:    U.S.        Chevy Malibu Wagon 19.2  3.605        2.56        125          267         8
4:    U.S.    Chrysler LeBaron Wagon 18.5  3.940        2.45        150          360         8
5:    U.S.                  Chevette 30.0  2.155        3.70         68           98         4
6:   Japan             Toyota Corona 27.5  2.560        3.05         95          134         4
> cars.use <- cars[,-1]
> cars.use
[1] -1

Upvotes: 2

Views: 1213

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99351

You can solve this problem in your call to fread().

If you change your fread() call to drop the first column by name (or by number), the column will be skipped upon reading.

fread("cars.csv", drop = "Country", header = TRUE)

The reason you are having problems subsetting is because fread() returns a data table by default. If you need a data frame, change the data.table argument to FALSE.

cars <- fread("cars.csv", header = TRUE, data.table = FALSE)

Now we have a data frame, and the code cars[,-1] that you used will work. And if you want to drop the column and return a data frame, combine these two.

fread("cars.csv", drop = "Country", header = TRUE, data.table = FALSE)

See help(fread) for further details.

Upvotes: 6

tags
tags

Reputation: 4060

One option is setting your all column Country to NULL. This can be done as follows:

# Create dataframe
df <- read.delim(text='
Country Car MPG Weight Drive_Ratio Horsepower Displacement Cylinders
U.S. BuickEstateWagon 16.9 4.360 2.73 155 350 8
U.S. FordCountrySquireWagon 15.5 4.054 2.26 142 351 8
U.S. ChevyMalibuWagon 19.2 3.605 2.56 125 267 8
U.S. ChryslerLeBaronWagon 18.5 3.940 2.45 150 360 8
U.S. Chevette 30.0 2.155 3.70 68 98 4
Japan ToyotaCorona 27.5 2.560 3.05 95 134 4', sep=' ')

#> df
#  Country                    Car  MPG Weight Drive_Ratio Horsepower
#1    U.S.       BuickEstateWagon 16.9  4.360        2.73        155
#2    U.S. FordCountrySquireWagon 15.5  4.054        2.26        142
#3    U.S.       ChevyMalibuWagon 19.2  3.605        2.56        125
#4    U.S.   ChryslerLeBaronWagon 18.5  3.940        2.45        150
#5    U.S.               Chevette 30.0  2.155        3.70         68
#6   Japan           ToyotaCorona 27.5  2.560        3.05         95
#  Displacement Cylinders
#1          350         8
#2          351         8
#3          267         8
#4          360         8
#5           98         4
#6          134         4

# Remove the 'Country' columns from the dataframe 
df$Country <- NULL

#> df
#                     Car  MPG Weight Drive_Ratio Horsepower Displacement
#1       BuickEstateWagon 16.9  4.360        2.73        155          350
#2 FordCountrySquireWagon 15.5  4.054        2.26        142          351
#3       ChevyMalibuWagon 19.2  3.605        2.56        125          267
#4   ChryslerLeBaronWagon 18.5  3.940        2.45        150          360
#5               Chevette 30.0  2.155        3.70         68           98
#6           ToyotaCorona 27.5  2.560        3.05         95          134
#  Cylinders
#1         8
#2         8
#3         8
#4         8
#5         4
#6         4

Upvotes: 1

akrun
akrun

Reputation: 887531

By using fread, we are getting a data.table. To subset, a data.table, with=FALSE can be used.

cars[,-1, with=FALSE]

It is described in ?data.table

By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically.

data

 cars <- data.table(Col1= 1:5, Col2= 6:10)

Upvotes: 7

Related Questions