AnnK
AnnK

Reputation: 189

Restructure data on rows to decrease number of NAs in a dataframe in R

I have a dataset of 392 rows and 156 columns that represent detections and non-detections of a species. Each column represents a 'visit' to the field for survey, whereas each row represent surveyed sites, holding 0 and 1 whether the species of interested was recorded in each of these 'visits' to the field, or NA if there was no survey conducted during that specific time period. I agregated my visits by month, so each column represent 'monthly visits', that is a 30 day interval within a given year. Given that I have several years of data, I created consecutive and sequential month periods that span all the years for which I have data. Given that most sites were surveyed in different years, several columns (time periods) are unique for each site and thus I have a LOT of NAs: I have 1,646 records of either 0/1 and 59,506 NAs.

I want to restructure my database in such a way that I can remove as many NAs as possible, by treating each column not as a specific time period but as a generic time interval. So instead of column 1 being for example the specific period of 3/2008-4/2018, to be only 'Survey 1', which will represent a different month and year for each site. By removing all the NAs of each site previous to the actual survey period for that site, I can have a cleaner, smaller database with less NAs. The idea would be the following:

Go from this df I have:

df <- read.table(text = "3/2008-4/2018 5/2008-6/2008 7/2009-8/2009 9/2009-10/2009 11/2009-12/2009 01/2010-02/2010 03/2010-04/2010 05/2010-06/2010 07/2010-08/2010
1 NA NA NA NA NA NA 1 1 1 
2 NA NA NA 1 0 NA NA NA NA
3 NA NA NA 0 0 NA NA NA NA
4 0 1 0 1 1 1 NA NA NA
5 0 1 NA NA NA 1 0 1 1")

To this new df:

df_new <- read.table(text = "v1 v2 v3 v4 V5 V6
1 1 1 1 NA NA NA
2 1 0 NA NA NA NA
3 0 0 NA NA NA NA
4 0 1 0 1 1 1
5 0 1 1 0 1 1")

Could anyone help me create a code to do this please? Thank you!

Upvotes: 0

Views: 32

Answers (1)

GKi
GKi

Reputation: 39667

You can use na.omit and then subset using [ to get vectors of equal length.

x <- apply(unname(df), 1, na.omit)
t(sapply(x, "[", 1:max(lengths(x))))
#  [,1] [,2] [,3] [,4] [,5] [,6]
#1    1    1    1   NA   NA   NA
#2    1    0   NA   NA   NA   NA
#3    0    0   NA   NA   NA   NA
#4    0    1    0    1    1    1
#5    0    1    1    0    1    1

Upvotes: 1

Related Questions