JGonzalezM
JGonzalezM

Reputation: 21

subset data frame returns factor with level instead of single value

I have a data frame that contains 2 columns, filename and monitorid.

  filename monitorid
1  001.csv         1
2  002.csv         2
3  003.csv         3
4  004.csv         4
5  005.csv         5
6  006.csv         6

I am trying to subset in order to select the filename for a given monitorid

filename <- files[files$monitorid==3,1]

I expected this to return "003.csv"

Instread it returns

[1] 003.csv
6 Levels: 001.csv 002.csv 003.csv 004.csv 005.csv 006.csv

However

filename <- files[files$monitorid==3,2] returns 
[1] 3

as expected

I do not understand why choosing column 1 returns a factor with multiple levels while column 2 returns a single value.

Any ideas would be greatly appreciated.


@KenM This is the function I used to read the files names

getfileinfo <- function (directory){
## Reads file names into filenames variable        
        filenames <- list.files (path = directory)
## assigns monitorids to each file name
        monitorid <- as.numeric(substr(filenames,1,3))
##combines filenames and monitorid into data frame, files
        files <- data.frame(filenames, monitorid)
        names(files) <- c("filename","monitorid")
        return(files)

}

Solution

Here's is the ouput from each line

    filenames <- list.files (path = directory)
    class(filenames)
[1] "character"
    monitorid <- as.numeric(substr(filenames,1,3))
    class(monitorid)
[1] "numeric"
    files <- data.frame(filenames, monitorid)
    sapply (files, class)
filenames monitorid 
 "factor" "numeric" 

As noted by both KenM and BeginneR when combined into a data frame the character vector filenames becomes a column of data class factor

Corrected code

files <- data.frame(filenames, monitorid, stringsAsFactors = FALSE)
sapply (files, class)
  filenames   monitorid 
"character"   "numeric" 

Upvotes: 1

Views: 6806

Answers (1)

KenM
KenM

Reputation: 2826

I do not understand why choosing column 1 returns a factor with multiple levels while column 2 returns a single value.

You get factor because you loaded "filename" column as factor, while (I suppose) you want a string/character for the value of "filename" object.

Solutions are either: 1. When you load the csv file, read the values as character instead of factor; or 2. Convert the factor into character.

For the solution 1, set colClasses = "character in read.csv() (See ?read.csv) For the solution 2, do filename <- as.character(files[files$monitorid==3,1])

(BTW, please include a reproducible example when asking a question)

Upvotes: 1

Related Questions