Reputation: 5807
I have a dataframe imported from an excel csv. This works so far, however the csv saves always 49'999 rows, even though there are only roughly 10'000 rows of data. The rest of the rows is all empty.
How can I select now only that part of the data frame containing at least one column with data? with one row I managed to do it:
MyDF[length(MyDF[MyDF!=""])>0]
But how can I do that for a whole dataframe? I would like to assign that "sub-data frame" to a new data frame.
EDIT Example Data:
Very simple example:
data<-matrix(c(1,2,1,NA,NA,NA),nrow=3,ncol=2,byrow=TRUE)
[,1] [,2]
[1,] 1 2
[2,] 1 NA
[3,] NA NA
data2<-data[sum(!is.na(test))>0,]
doesn't subset..but why?
Upvotes: 1
Views: 8770
Reputation: 4349
When reading data like this created in Excel, you have to take care of the NA
and the ""
, depending if the column is recognized as numeric or as character/factor by R. Alternatively, you can add the option na.strings = c("NA", "")
when reading your data, so all the blanks will be NA and you don't need to care of the ""
.
newdata = data[!apply(is.na(data) | data == "", 1, all), ]
the !apply(is.na(data) | data == "", 1, all)
inside will give you TRUE for the rows with at least one column with data. For the rest is FALSE, so they are not taken.
Also, when creating a csv file from Excel, you can try to remove all the columns to the right of your data, because it can also artificially create ghost data columns. This works also to remove the blank rows at the bottom.
Upvotes: 2