Reputation: 83
So, I'm trying to figure out a larger problem, and I think it may stem from exactly what's happening when I import data from a .txt
file. My regular beginning commands are:
data<-read.table("mydata.txt",header=T)
attach(data)
So if my data has say, 3 columns with headers "Var1"
, "Var2"
and "Var3"
, how exactly is everything imported? It seems as though it is imported as 3 separate vectors, then bound together, similar to using cbind()
.
My larger issue is modifying the data. If a row in my data frame has an empty spot (in any column) I need to remove it:
data <- data[complete.cases(data),]
Perfect - now say that the original data frame had 100 rows, 5 of which had an empty slot. My new data frame should have 95 rows, right? Well if I try:
> length(Var1)
[1] 100
> length(data$Var1)
[1] 95
So it seems like the original column labelled Var1
is unaffected by the line where I rewrote the entire data frame. This is why I believe that when I import the data, I really just have 3 separate columns stored somewhere called Var1
, Var2
and Var3
. As far as getting R to recognize that I want the modified version of the column, I think I need to do something along the lines of:
Var1 <- data$Var1 #Repeat for every variable
My issue with this is that I will need to write the above bit of code for every single variable. The data frame I have is large, and this way of coding seems tedious. Is there a better way for me to transform my data, then be able to call the modified variables, without needing to use the data$ precursor every time?
Upvotes: 2
Views: 1716
Reputation: 174778
read.table()
reads the data into a data frame with a component (column) for each column (variable) in the text file. R's data frame is like an Excel spreadsheet, each column in the sheet can contain a different type of data (contrast that with a matrix, which in R can contain data only of a single type).
In effect, the result is as if the data were read in column by column and then bound together column-wise using the cbind.data.frame()
method. This is not how it is done in practice though. You have a single object data
with three components, none of which can be accessed by typing their name (e.g. Var1
). Try exactly this
data <- read.table("mydata.txt", header = TRUE)
Var1
in a clean session (best if you start a new session to try this, just in case).
If you were to type ls()
you would see only data
listed (assuming a clean session). This is clearl evidence against your thinking that you have three columns and individual objects.
The real problem here is attach()
not read.table()
.
There are very few good uses of attach()
and the one you show is not among them. attach(data)
places a copy of data
on the search path. The key point there is copy. What is on the search path is not the same thing as data
in the global environment (your workspace). Any changes to the data
in the global environment are not reflected in the copy on the search path, because these are two, completely separate objects.
R has a search path where it looks for named objects. Normally R doesn't look inside objects and hence Var1
etc will not be found whenever you type their name at the prompt or attempt to use the object directly. When you attach()
an object you can think of this as opening the object up to R's search. But the thing that catches people out is that one is now looking inside a copy of the object and not the object itself.
In interactive sessions, there are useful helper functions that mean you don't need to be typing data$
all the time. See ?with
, ?within
, ?transform
for example.
Really don't use attach()
in lieu of a bit of typing.
Upvotes: 7
Reputation: 11893
I'm pretty sure R reads files row by row. (In fact, I think just about all programming languages work this way.) I wonder if you are attaching your data frame before removing the incomplete cases. The behavior you describe is fairly typical when people call attach(data)
beforehand. In general, it is recommended that you do not use attach()
at all in R. But if you must use it, call detach(data)
first, then modify the data frame, and then (if you must) call attach(data)
again. At that point, you will no longer have this problem.
Note, it is also possible that your problem is something different. However, we cannot tell, based on the information provided thus far. You will want to provide a reproducible example so that people can help you more effectively, see here: how-to-make-a-great-r-reproducible-example.
Upvotes: 3