Reputation: 18264
I have a file that looks like so:
date A B
2014-01-01 2 3
2014-01-02 5 NA
2014-01-03 NA NA
2014-01-04 7 11
If I use newdata <- na.omit(data)
where data
is the above table loaded via R, then I get only two data points. I get that since it will filter all instances of NA. What I want to do is to filter for each A
and B
so that I get three data points for A
and only two for B
. Clearly, my main data set is much larger than that and the numbers are different but neither should not matter.
How can I achieve that?
Upvotes: 12
Views: 63411
Reputation: 71
In case of Python we can use subset to define column/columns and inplace true is to make the changes in DF:- rounds2.dropna(subset=['company_permalink'],inplace=True)
Upvotes: 1
Reputation: 51680
Every column in a data frame must have the same number of elements, that is why NA
s come in handy in the first place...
What you can do is
df.a <- df[!is.na(df$A), -3]
df.b <- df[!is.na(df$B), -2]
Upvotes: 1
Reputation: 174938
Use is.na()
on the relevant vector of data you wish to look for and index using the negated result. For exmaple:
R> data[!is.na(data$A), ]
date A B
1 2014-01-01 2 3
2 2014-01-02 5 NA
4 2014-01-04 7 11
R> data[!is.na(data$B), ]
date A B
1 2014-01-01 2 3
4 2014-01-04 7 11
is.na()
returns TRUE
for every element that is NA
and FALSE
otherwise. To index the rows of the data frame, we can use this logical vector, but we want its converse. Hence we use !
to imply the opposite (TRUE
becomes FALSE
and vice versa).
You can restrict which columns you return by adding an index for the columns after the ,
in [ , ]
, e.g.
R> data[!is.na(data$A), 1:2]
date A
1 2014-01-01 2
2 2014-01-02 5
4 2014-01-04 7
Upvotes: 17