Reputation: 1355
When the command below is typed into R,a data frame of 11 rows and 5 columns (variables) is created. My question is how does R know there are 5 columns in this data set? What is stopping R from creating 1 row by 55 columns data frame?
Thank you!
d <- read.table(header=FALSE, fill=TRUE, text="
1 2010-10-04 52495 2010-10-04 11.6
2 2010-10-01 53000 2010-10-01 15.3
3 2010-09-30 52916 2010-09-30 14.3
4 2010-09-29 52785 2010-09-29 11.3
5 2010-09-28 53348 2010-09-28 18.2
6 2010-09-27 52885 2010-09-24 11.7
7 2010-09-24 52174 2010-09-23 15.0
8 2010-09-23 51461 2010-09-22 18.6
9 2010-09-22 51286 2010-09-20 17.9
10 2010-09-21 50968
11 2010-09-20 49250 ")
Upvotes: 2
Views: 524
Reputation: 4339
The function read.table
has several parameters, most of them have default values, so you don't need to specify them. In particular, there is the parameter sep
, which by default is ""
. This parameter is the one doing the magic of recognizing the number of columns. If you change your code to:
data.txt="
1 2010-10-04 52495 2010-10-04 11.6
2 2010-10-01 53000 2010-10-01 15.3
3 2010-09-30 52916 2010-09-30 14.3
4 2010-09-29 52785 2010-09-29 11.3
5 2010-09-28 53348 2010-09-28 18.2
6 2010-09-27 52885 2010-09-24 11.7
7 2010-09-24 52174 2010-09-23 15.0
8 2010-09-23 51461 2010-09-22 18.6
9 2010-09-22 51286 2010-09-20 17.9
10 2010-09-21 50968
11 2010-09-20 49250 "
# reading the same data with different sep values
d0 <- read.table(header=FALSE, fill=TRUE, text=data.txt)
d1 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="")
d2 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=",")
d3 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=";")
d4 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="-")
d5 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep="0")
d6 <- read.table(header=FALSE, fill=TRUE, text=data.txt, sep=".")
# aggregatting all data frames
d = list(d0=d0, d1=d1, d2=d2, d3=d3, d4=d4, d5=d5, d6=d6)
dims.d = sapply(d, dim) # get the dimension of all dataframes
rownames(dims.d) = c("nrow", "ncol")
print(dims.d)
d0 d1 d2 d3 d4 d5 d6
nrow 11 11 11 11 11 11 11
ncol 5 5 1 1 5 12 2
Now you see the data is been reading differently. All the data frames with 5 columns are totally different but the first two (you can check). Why there are always 11 rows? Because the end of line is used to indicate the begining of a new row. It's a good idea to always look to this default parameters to see what's happening without our explicit intervention.
Upvotes: 3
Reputation: 15458
There are 5 columns and 11 rows, so you should expect that (no surprise).For example, if you delete the first column with 1,2...11, but retained other columns and then rearrange with date on one col and values on another col, then you would have 20 rows and 2 columns.
d <- read.table(header=FALSE, fill=TRUE, text="
2010-10-04 52495
2010-10-01 53000
2010-09-30 52916
2010-09-29 52785
2010-09-28 53348
2010-09-27 52885
2010-09-24 52174
2010-09-23 51461
2010-09-22 51286
2010-09-21 50968
2010-09-20 49250
2010-10-04 11.6
2010-10-01 15.3
2010-09-30 14.3
2010-09-29 11.3
2010-09-28 18.2
2010-09-24 11.7
2010-09-23 15.0
2010-09-22 18.6
2010-09-20 17.9")
If you want to create 1 rows by 43 columns, you need to put all dates and values in one rows(in script file).Something like this:
d <- read.table(header=FALSE, fill=TRUE, text="2010-10-04 52495 2010-10-01 53000 2010 09 30 52916 2010-09-29 52785 2010-09-28 53348 2010-9-27 52885 2010-09-24 52174 2010-09-23 51461 2010-09-22 51286 2010-09-21 50968 2010-09-20 49250 2010-10-04 11.6 2010-10-01 15.3 2010-09-30 14.3 2010-09-29 11.3 2010-09-28 18.2 2010-09-24 11. 2010-09-23 15.0 2010-09-22 18.6 2010-09-20 17.9")
For 2 rows by 32 columns, put in 2 rows in script. Something like this:
d <- read.table(header=FALSE, fill=TRUE, text="2010-10-04 52495 2010-10-01 53000 2010 09 30 52916 2010-09-29 52785 2010-09-28 53348 2010-9-27 52885 2010-09-24 52174 2010-09-23 51461 2010-09-22 51286 2010-09-21 50968 2010-09-20 49250 2010-10-04 11.6 2010-10-01 15.3 2010-09-30 14.3 2010-09-29 11.3
2010-09-28 18.2 2010-09-24 11. 2010-09-23 15.0 2010-09-22 18.6 2010-09-20 17.9")
Upvotes: 1
Reputation: 57686
read.table
and friends are meant for reading tabular data, ie input that can be described as having a set number of rows and columns. The function infers the rows and columns from the delimiters and newlines within the input, which is why you get 11 rows and 5 columns. If you have sequential input, ie just a bunch of elements with no particular structure, use scan
.
On the other hand, you have a row number in that input, which would imply that you do indeed have tabular data... do you?
Upvotes: 2