Reputation: 333
I have a table of stock prices here:
https://drive.google.com/file/d/1S666wiCzf-8MfgugN3IZOqCiM7tNPFh9/view?usp=sharing
Some columns have NA's because the company does not exist (until later dates), or the company folded.
What I want to do is: select columns that has no NA's. I use data.table because it is faster. Here are my working codes:
example <- fread(file = "example.csv", key = "date")
example_select <- example[,
lapply(.SD,
function(x) not(sum(is.na(x) > 0)))
] %>%
as.logical(.)
example[, ..example_select]
Is there better (less lines) code to do the same? Thank you!
Upvotes: 6
Views: 1581
Reputation: 50773
In the ?data.table
help page, about the .SDcols
argument you can find that you can use the predicate function or its negation, so simplest solution would be:
example[, .SD, .SDcols = !anyNA]
Upvotes: 5
Reputation: 974
There are lots of ways you could do this. Here's how I usually do it - a data.table
approach without lapply
:
example[, .SD, .SDcols = colSums(is.na(example)) == 0]
Upvotes: 4
Reputation: 389275
Using Filter
:
library(data.table)
Filter(function(x) all(!is.na(x)), fread('example.csv'))
# date ACU ACY AE AEF AIM AIRI AMS APT
# 1: 2001-01-02 2.75 4.75 14.4 8.44 2376.00 250.00 2.50 1.06
# 2: 2001-01-03 2.75 4.50 14.5 9.00 2409.00 250.00 2.50 1.12
# 3: 2001-01-04 2.75 4.50 14.1 8.88 2508.00 250.00 2.50 1.06
# 4: 2001-01-05 2.38 4.50 14.1 8.88 2475.00 250.00 2.25 1.12
# 5: 2001-01-08 2.56 4.75 14.3 8.75 2376.00 250.00 2.38 1.06
# ---
#5072: 2021-03-02 36.95 10.59 28.1 8.77 2.34 1.61 2.48 14.33
#5073: 2021-03-03 38.40 10.00 30.1 8.78 2.26 1.57 2.47 12.92
#5074: 2021-03-04 37.90 8.03 30.8 8.63 2.09 1.44 2.27 12.44
#5075: 2021-03-05 35.68 8.13 31.5 8.70 2.05 1.48 2.35 12.45
#5076: 2021-03-08 37.87 8.22 31.9 8.59 2.01 1.52 2.47 12.15
# ARMP ASXC
# 1: 4.90e+05 178.75
# 2: 4.72e+05 192.97
# 3: 5.42e+05 300.62
# 4: 5.86e+05 300.62
# 5: 6.39e+05 276.25
# ---
#5072: 5.67e+00 3.92
#5073: 5.58e+00 4.54
#5074: 5.15e+00 4.08
#5075: 4.49e+00 3.81
#5076: 4.73e+00 4.15
Upvotes: 0
Reputation: 4497
An answer using tidyverse
packages
library(readr)
library(dplyr)
library(purrr)
data <- read_csv("~/Downloads/example.csv")
map2_dfc(data, names(data), .f = function(x, y) {
column <- tibble("{y}" := x)
if(any(is.na(column)))
return(NULL)
else
return(column)
})
Output
# A tibble: 5,076 x 11
date ACU ACY AE AEF AIM AIRI AMS APT ARMP ASXC
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2001-01-02 2.75 4.75 14.4 8.44 2376 250 2.5 1.06 490000 179.
2 2001-01-03 2.75 4.5 14.5 9 2409 250 2.5 1.12 472500 193.
3 2001-01-04 2.75 4.5 14.1 8.88 2508 250 2.5 1.06 542500 301.
4 2001-01-05 2.38 4.5 14.1 8.88 2475 250 2.25 1.12 586250 301.
5 2001-01-08 2.56 4.75 14.3 8.75 2376 250 2.38 1.06 638750 276.
6 2001-01-09 2.56 4.75 14.3 8.88 2409 250 2.38 1.06 568750 264.
7 2001-01-10 2.56 5.5 14.5 8.69 2310 300 2.12 1.12 586250 274.
8 2001-01-11 2.69 5.25 14.4 8.69 2310 300 2.25 1.19 564375 333.
9 2001-01-12 2.75 4.81 14.6 8.75 2541 275 2 1.38 564375 370.
10 2001-01-16 2.75 4.88 14.9 8.94 2772 300 2.12 1.62 595000 358.
# … with 5,066 more rows
Upvotes: -1
Reputation: 41260
Try:
example[,lapply(.SD, function(x) {if(anyNA(x)) {NULL} else {x}} )]
Upvotes: 2