jessica
jessica

Reputation: 1355

R: recognizing the number of columns when reading a Data Frame

When the command below is typed into R,a data frame of 11 rows and 5 columns (variables) is created. My question is how does R know there are 5 columns in this data set? What is stopping R from creating 1 row by 55 columns data frame?

Thank you!

d <- read.table(header=FALSE, fill=TRUE, text="
  1   2010-10-04 52495  2010-10-04 11.6  
  2   2010-10-01 53000  2010-10-01 15.3
  3   2010-09-30 52916  2010-09-30 14.3
  4   2010-09-29 52785  2010-09-29 11.3
  5   2010-09-28 53348  2010-09-28 18.2
  6   2010-09-27 52885  2010-09-24 11.7
  7   2010-09-24 52174  2010-09-23 15.0
  8   2010-09-23 51461  2010-09-22 18.6
  9   2010-09-22 51286  2010-09-20 17.9
  10  2010-09-21 50968  
  11  2010-09-20 49250  ")

Upvotes: 2

Views: 524

Answers (3)

Ricardo Oliveros-Ramos
Ricardo Oliveros-Ramos

Reputation: 4339

The function read.table has several parameters, most of them have default values, so you don't need to specify them. In particular, there is the parameter sep, which by default is "". This parameter is the one doing the magic of recognizing the number of columns. If you change your code to:

data.txt="
  1   2010-10-04 52495  2010-10-04 11.6  
  2   2010-10-01 53000  2010-10-01 15.3
  3   2010-09-30 52916  2010-09-30 14.3
  4   2010-09-29 52785  2010-09-29 11.3
  5   2010-09-28 53348  2010-09-28 18.2
  6   2010-09-27 52885  2010-09-24 11.7
  7   2010-09-24 52174  2010-09-23 15.0
  8   2010-09-23 51461  2010-09-22 18.6
  9   2010-09-22 51286  2010-09-20 17.9
  10  2010-09-21 50968  
  11  2010-09-20 49250  "

# reading the same data with different sep values
d0 <- read.table(header=FALSE, fill=TRUE, text=data.txt)
d1 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="")
d2 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=",")
d3 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=";")
d4 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="-")
d5 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="0")
d6 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=".")

# aggregatting all data frames
d = list(d0=d0, d1=d1, d2=d2, d3=d3, d4=d4, d5=d5, d6=d6)
dims.d = sapply(d, dim) # get the dimension of all dataframes
rownames(dims.d) = c("nrow", "ncol")
print(dims.d)

     d0 d1 d2 d3 d4 d5 d6
nrow 11 11 11 11 11 11 11
ncol  5  5  1  1  5 12  2

Now you see the data is been reading differently. All the data frames with 5 columns are totally different but the first two (you can check). Why there are always 11 rows? Because the end of line is used to indicate the begining of a new row. It's a good idea to always look to this default parameters to see what's happening without our explicit intervention.

Upvotes: 3

Metrics
Metrics

Reputation: 15458

There are 5 columns and 11 rows, so you should expect that (no surprise).For example, if you delete the first column with 1,2...11, but retained other columns and then rearrange with date on one col and values on another col, then you would have 20 rows and 2 columns.

d <- read.table(header=FALSE, fill=TRUE, text="
          2010-10-04 52495   
          2010-10-01 53000  
          2010-09-30 52916  
          2010-09-29 52785  
          2010-09-28 53348  
          2010-09-27 52885  
          2010-09-24 52174  
          2010-09-23 51461  
          2010-09-22 51286  
          2010-09-21 50968  
          2010-09-20 49250
          2010-10-04 11.6
          2010-10-01 15.3 
          2010-09-30 14.3 
          2010-09-29 11.3 
          2010-09-28 18.2
          2010-09-24 11.7
          2010-09-23 15.0
          2010-09-22 18.6
          2010-09-20 17.9")

If you want to create 1 rows by 43 columns, you need to put all dates and values in one rows(in script file).Something like this:

d <- read.table(header=FALSE, fill=TRUE, text="2010-10-04 52495 2010-10-01 53000 2010 09 30 52916 2010-09-29 52785 2010-09-28 53348 2010-9-27 52885 2010-09-24 52174 2010-09-23 51461  2010-09-22 51286  2010-09-21 50968 2010-09-20 49250 2010-10-04 11.6 2010-10-01 15.3 2010-09-30 14.3 2010-09-29 11.3 2010-09-28 18.2 2010-09-24 11. 2010-09-23 15.0 2010-09-22 18.6 2010-09-20 17.9")

For 2 rows by 32 columns, put in 2 rows in script. Something like this:

d <- read.table(header=FALSE, fill=TRUE, text="2010-10-04 52495 2010-10-01 53000 2010 09 30 52916 2010-09-29 52785 2010-09-28 53348 2010-9-27 52885 2010-09-24 52174 2010-09-23 51461  2010-09-22 51286  2010-09-21 50968 2010-09-20 49250 2010-10-04 11.6 2010-10-01 15.3 2010-09-30 14.3 2010-09-29 11.3
                2010-09-28 18.2 2010-09-24 11. 2010-09-23 15.0 2010-09-22 18.6 2010-09-20 17.9")

Upvotes: 1

Hong Ooi
Hong Ooi

Reputation: 57686

read.table and friends are meant for reading tabular data, ie input that can be described as having a set number of rows and columns. The function infers the rows and columns from the delimiters and newlines within the input, which is why you get 11 rows and 5 columns. If you have sequential input, ie just a bunch of elements with no particular structure, use scan.

On the other hand, you have a row number in that input, which would imply that you do indeed have tabular data... do you?

Upvotes: 2

Related Questions