math
math

Reputation: 2022

import .csv file in R containing strings and numerics how to convert?

Suppose I have a .csv file and imported it in R:

    X        A       B       C 
1           good    luck    man
2 string1            
3 string2   2.2     3.3     4
4 string3   0.1     10      3

I used:

read.csv("~/Desktop/test.csv", stringsAsFactors=FALSE)

This is of class data.frame. Now I delete the first row and set the first column as rownames

test <- test[-1,]
rownames(test) <- test[,1]
test <- test[,-1]

This gives

> test
          A   B C
string1          
string2 2.2 3.3 4
string3 0.1  10 3

The problem is, all the values are of class character. I would like to change them to numeric and convert the empty "cells" (i.e. empty strings "") to NA but still having a data.frame. How can this be achieved?

Upvotes: 0

Views: 188

Answers (2)

Spacedman
Spacedman

Reputation: 94172

So your problem is that your CSV has two heading lines, and you want to use the first as column names?

Read in, skipping both header lines with skip=2 and (probably) head=FALSE in read.csv.

Then you have a data frame with generic column names but correct types.

Then read the second line of the file again, using readLines, and split that to get column names for the data frame you read in.

> df = read.csv("twohead.txt",skip=2,head=FALSE)
> colnames(df)=strsplit(readLines("twohead.txt",n=2)[2],",")[[1]]
> df
          good luck man
1 string1   NA   NA  NA
2 string2  1.2  1.1 2.2
3 string3  1.5  3.2 1.2

Rowname processing is as you have it, although I'd do:

> rownames(df)=df[,1]
> df[[1]]=NULL

Giving a df:

> summary(df)
      good            luck            man      
 Min.   :1.200   Min.   :1.100   Min.   :1.20  
 1st Qu.:1.275   1st Qu.:1.625   1st Qu.:1.45  
 Median :1.350   Median :2.150   Median :1.70  
 Mean   :1.350   Mean   :2.150   Mean   :1.70  
 3rd Qu.:1.425   3rd Qu.:2.675   3rd Qu.:1.95  
 Max.   :1.500   Max.   :3.200   Max.   :2.20  
 NA's   :1       NA's   :1       NA's   :1     

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92282

I couldn't find a good duplicate, so here goes. Use [] in order to preserve the class of test in combination with sapply which by default operates over columns of the data frame (or the arguments, as data.frame is essentially a list with columns as its arguments)

test[] <- lapply(test, as.numeric)

Note: Make sure that non of your columns is of class factor, otherwise this will return wrong results (without triggering a warning)

Upvotes: 1

Related Questions