MichiZH
MichiZH

Reputation: 5807

Remove empty rows from a csv file created in Excel

I have a dataframe imported from an excel csv. This works so far, however the csv saves always 49'999 rows, even though there are only roughly 10'000 rows of data. The rest of the rows is all empty.

How can I select now only that part of the data frame containing at least one column with data? with one row I managed to do it:

 MyDF[length(MyDF[MyDF!=""])>0]

But how can I do that for a whole dataframe? I would like to assign that "sub-data frame" to a new data frame.

EDIT Example Data:

Very simple example:

data<-matrix(c(1,2,1,NA,NA,NA),nrow=3,ncol=2,byrow=TRUE)
     [,1] [,2]
[1,]    1    2
[2,]    1   NA
[3,]   NA   NA

data2<-data[sum(!is.na(test))>0,]

doesn't subset..but why?

Upvotes: 1

Views: 8770

Answers (1)

Ricardo Oliveros-Ramos
Ricardo Oliveros-Ramos

Reputation: 4349

When reading data like this created in Excel, you have to take care of the NA and the "", depending if the column is recognized as numeric or as character/factor by R. Alternatively, you can add the option na.strings = c("NA", "") when reading your data, so all the blanks will be NA and you don't need to care of the "".

newdata = data[!apply(is.na(data) | data == "", 1, all), ]

the !apply(is.na(data) | data == "", 1, all) inside will give you TRUE for the rows with at least one column with data. For the rest is FALSE, so they are not taken.

Also, when creating a csv file from Excel, you can try to remove all the columns to the right of your data, because it can also artificially create ghost data columns. This works also to remove the blank rows at the bottom.

Upvotes: 2

Related Questions