stats_noob
stats_noob

Reputation: 5935

XLSX File Containing Column with Some Missing Values Entirely Disappears when Imported into R

I have a XLSX file ("my_file.xlsx") in Microsoft Excel. One of the columns contains factor variables - within this column, there are many "blank values" (these appear as complete empty cells). When you filter these in Excel, they look like this (Blanks):

enter image description here I am trying to import this XLSX file into R:

library("readxl")
my_data <- read_excel("my_file.xlsx")

The problem is, when I import this file into R - the column I was mentioning earlier is entirely replaced with NA's. When I use the "str()" command to inspect the file, this column type appears as "logi : NA NA NA ..".

I would not have been surprised if the actual empty cells for that column had been replaced with NA's - but what I can't figure out is why all cells for that column have been replaced with NA's.

Does anyone know why this might be happening? Is it better to convert the file to CSV instead, and then import it into R?

Thanks!

Upvotes: 0

Views: 736

Answers (1)

Sean McKenzie
Sean McKenzie

Reputation: 909

Welcome to Stack Overflow, Noob! In the future, please try to post a reproducible example; this format helps respondents better diagnosis your issue and find fixes to your code.

Looking at the screenshot your took, I think that the issue arises from the fact that your header row in Excel is blank for multiple columns. Potential solutions include:

  1. Ensuring that each column has a unique column name
  2. Not reading in the header column header
  3. Specifying the range of cells to read over
  4. Trying to open the file using the openxlsx:: library instead

I jus tried to recreate your issue here and using the openxlsx:: library seemed to work fine (you will need to download the dummy Excel file here:

library(openxlsx)

XL<-read.xlsx("Test_Excel.xlsx")
str(XL)

Upvotes: 1

Related Questions