Reputation: 1241
I have a list of data frames that are pulled in from an Excel file. Some of the columns in the data frames have are named 'NA', contain no data, and are useless; therefore, I would like to drop them. The list contains 9 data frames and most have columns with 'NA' as their title.
Through multiple iterations, R has returned an error or warning. Including:
all_list <- all_list[!is.na(colnames(all_list))]
Warning message:
In is.na(colnames(all_list)) :
is.na() applied to non-(list or vector) of type 'NULL'
The above did not serve it's intended purpose, as the NA columns are still in each data frame.
all_list <- lapply(all_list, function(x){
colnames(x) <- x[!is.na(colnames(x))]
return(x)
})
This seems closer to the intended output, but reformats the data frame columns to be filled with NA's instead.
Here is a sample of my data showcasing the aforementioned NA's:
str(all_list)
List of 8
$ Retail :'data.frame': 305 obs. of 25 variables:
$ NA : chr [1:305] NA "Variable" "Variable" "Variable" ...
$ TIMEPERIOD : chr [1:305] NA "41640" "41671" "41699" ...
Edit: In case it wasn't clear, these blank columns filled with NA are the result of formatting within Excel for the sake of spacing; however, they serve no purpose for analysis within R.
Upvotes: 1
Views: 67
Reputation: 20085
You are pretty close to solution. A slight change in function
used with lapply
will take you to expected result.
The lapply
traverses through each dataframe and your function needs to subset
columns which names are not equal
to NA
.
all_list < lapply(all_list, function(x){
x[,colnames(x) != "NA"]
})
# Verify changed data all_list
all_list[[1]]
# col1 col2
# 1 g x
# 2 j z
# 3 n p
# 4 u o
# 5 e b
Data:
set.seed(1)
df1 <- data.frame(sample(letters, 5), sample(letters, 5), 1:5,
stringsAsFactors = FALSE)
names(df1) <- c("col1","col2","NA")
df2 <- data.frame(sample(letters, 5), sample(letters, 5), 11:15,
stringsAsFactors = FALSE)
names(df2) <- c("col1","col2","NA")
df3 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
stringsAsFactors = FALSE)
names(df3) <- c("col1","col2","NA")
df4 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
stringsAsFactors = FALSE)
names(df4) <- c("col1","col2","NA")
all_list <- list(df1,df2,df3,df4)
#check data
all_list[[1]]
# col1 col2 NA
#1 g x 1
#2 j z 2
#3 n p 3
#4 u o 4
$5 e b 5
# all_list[[2]], all_list[[3]] and all_list[[4]] contains similar values
Upvotes: 1