Franchise
Franchise

Reputation: 1241

Drop columns with a 'NA' header from data frames in a list?

I have a list of data frames that are pulled in from an Excel file. Some of the columns in the data frames have are named 'NA', contain no data, and are useless; therefore, I would like to drop them. The list contains 9 data frames and most have columns with 'NA' as their title.

Through multiple iterations, R has returned an error or warning. Including:

all_list <- all_list[!is.na(colnames(all_list))]
Warning message:
In is.na(colnames(all_list)) :
is.na() applied to non-(list or vector) of type 'NULL'

The above did not serve it's intended purpose, as the NA columns are still in each data frame.

all_list <- lapply(all_list, function(x){
colnames(x) <- x[!is.na(colnames(x))]
return(x)
})

This seems closer to the intended output, but reformats the data frame columns to be filled with NA's instead.

Here is a sample of my data showcasing the aforementioned NA's:

str(all_list)
List of 8
$ Retail        :'data.frame':  305 obs. of  25 variables:
$ NA                    : chr [1:305] NA "Variable" "Variable" "Variable" ...
$ TIMEPERIOD            : chr [1:305] NA "41640" "41671" "41699" ...

Edit: In case it wasn't clear, these blank columns filled with NA are the result of formatting within Excel for the sake of spacing; however, they serve no purpose for analysis within R.

Upvotes: 1

Views: 67

Answers (1)

MKR
MKR

Reputation: 20085

You are pretty close to solution. A slight change in function used with lapply will take you to expected result.

The lapply traverses through each dataframe and your function needs to subset columns which names are not equal to NA.

all_list < lapply(all_list, function(x){
  x[,colnames(x)  != "NA"]
})

# Verify changed data all_list
all_list[[1]]
#   col1 col2
# 1    g    x
# 2    j    z
# 3    n    p
# 4    u    o
# 5    e    b

Data:

set.seed(1)

df1 <- data.frame(sample(letters, 5), sample(letters, 5), 1:5,
                  stringsAsFactors = FALSE)
names(df1) <- c("col1","col2","NA")
df2 <- data.frame(sample(letters, 5), sample(letters, 5), 11:15,
                  stringsAsFactors = FALSE)
names(df2) <- c("col1","col2","NA")
df3 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
                  stringsAsFactors = FALSE)
names(df3) <- c("col1","col2","NA")
df4 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
                  stringsAsFactors = FALSE)
names(df4) <- c("col1","col2","NA")

all_list <- list(df1,df2,df3,df4)
#check data
all_list[[1]]
#  col1 col2 NA
#1    g    x  1
#2    j    z  2
#3    n    p  3
#4    u    o  4
$5    e    b  5

# all_list[[2]], all_list[[3]] and all_list[[4]] contains similar values

Upvotes: 1

Related Questions