useR
useR

Reputation: 3082

read.fwf() with column names and two kind of sep in R

I have similar issue with This issue

My data is like (no first line in my txt file)

----*----1----*----2----*---
Region                 Value
New York, NY        66,834.6
Kings, NY           34,722.9
Bronx, NY           31,729.8
Queens, NY          20,453.0
San Francisco, CA   16,526.2
Hudson, NJ          12,956.9
Suffolk, MA         11,691.6
Philadelphia, PA    11,241.1
Washington, DC       9,378.0
Alexandria IC, VA    8,552.2

my attempt is

#fwf data2
path <- "fwfdata2.txt"
data6 <- read.fwf(path, 
            widths=c(17, -3, 8), 
            header=TRUE,
            #sep=""
            as.is=FALSE)
data6

with answer

> data6
                  Region.................Value
New York, NY                          66,834.6
Kings, NY                             34,722.9
Bronx, NY                             31,729.8
Queens, NY                            20,453.0
San Francisco, CA                     16,526.2
Hudson, NJ                            12,956.9
Suffolk, MA                           11,691.6
Philadelphia, PA                      11,241.1
Washington, DC                         9,378.0
Alexandria IC, VA                      8,552.2
> dim(data6)
[1] 10  1

So problem is that as my data separated by "," and "". It will generate error as follow when i add sep="".

Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names,  : 
  more columns than column names

Upvotes: 1

Views: 1688

Answers (1)

Spacedman
Spacedman

Reputation: 94237

I think your problem is that read.fwf expects the header to be sep-separated, and the data to be fixed width:

header: a logical value indicating whether the file contains the
        names of the variables as its first line.  If present, the
        names must be delimited by ‘sep’.

   sep: character; the separator used internally; should be a
        character that does not occur in the file (except in the
        header).

I'd skip the header to read the data, then read the header by reading the first line only:

> data = read.fwf(path,widths=c(17,-3,8),head=FALSE,skip=1,as.is=TRUE)
> heads = read.fwf(path,widths=c(17,-3,8),head=FALSE,n=1,as.is=TRUE)
> names(data)=heads[1,]
> data
   Region               Value
1  New York, NY      66,834.6
2  Kings, NY         34,722.9
3  Bronx, NY         31,729.8
4  Queens, NY        20,453.0
5  San Francisco, CA 16,526.2
6  Hudson, NJ        12,956.9
7  Suffolk, MA       11,691.6
8  Philadelphia, PA  11,241.1
9  Washington, DC     9,378.0
10 Alexandria IC, VA  8,552.2

If you want the Region as a factor then use as.is=FALSE (as in your example) when reading the data, but you must use as.is=TRUE when reading the header otherwise it gets converted to numbers.

Did you also want to split the region into comma-separated parts and turn the comma-separated numbers into numeric values? You didn't say.

Upvotes: 2

Related Questions