How to remove NA data in only one columns?

I have a file that looks like so:

date       A  B
2014-01-01 2  3
2014-01-02 5  NA
2014-01-03 NA NA
2014-01-04 7  11

If I use newdata <- na.omit(data) where data is the above table loaded via R, then I get only two data points. I get that since it will filter all instances of NA. What I want to do is to filter for each A and B so that I get three data points for A and only two for B. Clearly, my main data set is much larger than that and the numbers are different but neither should not matter.

How can I achieve that?

Upvotes: 12

Views: 63411

Answers (3)

Vipul Saxena
Vipul Saxena

Reputation: 71

In case of Python we can use subset to define column/columns and inplace true is to make the changes in DF:- rounds2.dropna(subset=['company_permalink'],inplace=True)

Upvotes: 1

nico
nico

Reputation: 51680

Every column in a data frame must have the same number of elements, that is why NAs come in handy in the first place...

What you can do is

df.a <- df[!is.na(df$A), -3]
df.b <- df[!is.na(df$B), -2]

Upvotes: 1

Gavin Simpson
Gavin Simpson

Reputation: 174938

Use is.na() on the relevant vector of data you wish to look for and index using the negated result. For exmaple:

R> data[!is.na(data$A), ]
        date A  B
1 2014-01-01 2  3
2 2014-01-02 5 NA
4 2014-01-04 7 11
R> data[!is.na(data$B), ]
        date A  B
1 2014-01-01 2  3
4 2014-01-04 7 11

is.na() returns TRUE for every element that is NA and FALSE otherwise. To index the rows of the data frame, we can use this logical vector, but we want its converse. Hence we use ! to imply the opposite (TRUE becomes FALSE and vice versa).

You can restrict which columns you return by adding an index for the columns after the , in [ , ], e.g.

R> data[!is.na(data$A), 1:2]
        date A
1 2014-01-01 2
2 2014-01-02 5
4 2014-01-04 7

Upvotes: 17

Related Questions