Jd Baba
Jd Baba

Reputation: 6118

Add columns of two dataframes of unequal length and fill missing values

I am trying to add the columns of two dataframes. The dataframes comprise the three year data from 2010 to 2012. While downloading I found that the two datasets are of unequal length. It means that some of the data is missing. Now, I want to make these datasets of equal length and fill in the missing data first using linear interpolation. After that I could then combine the datasets.

The code I used is as follows. It downloads the datasets on the computer and read those data sets.

#######################################################################
# Download Data for station 1
#######################################################################

url <- "http://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=&begin_date=2010-01-01&end_date=2012-12-31&site_no=02469761"
download.file(url,destfile="ville-discharge.txt")
## Download Date
##"Wed Jun 05 22:19:36 2013"

#######################################################################
## Read the ville data 
#######################################################################

ville <- read.table("ville-discharge.txt",header=F,skip=23,sep="\t")
head(ville)
villedis <- ville[c(-1,-2),c(-1,-2,-4,-6)]
names(villedis) <- c("date","discharge")
villedis$date <- strptime(villedis$date,format="%Y-%m-%d %H:%M")
villedis$discharge <- as.numeric(as.character(villedis$discharge))

#########################################################################
## Download the borne data 
########################################################################
url1 <- "http://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=&begin_date=2010-01-01&end_date=2012-12-31&site_no=02428400"
download.file(url=url1,destfile="bone-discharge.txt")
## Download date and time
# [1] "Wed Jun 05 22:21:45 2013"

########################################################################
## Read the borne data 
########################################################################

borne <- read.table("bone-discharge.txt",header=F,sep="\t")
bornedis <- borne[c(-1,-2),c(-1,-2,-4,-6)]
names(bornedis)<- c("date","discharge")
bornedis$date <- strptime(bornedis$date,format="%Y-%m-%d %H:%M")
bornedis$discharge <- as.numeric(as.character(bornedis$discharge))

> dim(villedis)
[1] 26296     2
> dim(bornedis)
[1] 25593     2

Now I need to add the "discharge" column of two dataframes villedis and bornedis. It seems bornedis is missing some data. What would be the efficient way to add these data frames and put values instead of NA on the missing values ?

Upvotes: 1

Views: 1540

Answers (1)

Rorschach
Rorschach

Reputation: 32466

You could use 'merge' to merge the two datasets by date.

new <- merge(villedis, bornedis, by= c("date"), all=TRUE)

And it looks like

> dim(new)
[1] 26306     3
> head(new)
                 date discharge.x discharge.y
1 2010-01-01 00:00:00       46200       78500
2 2010-01-01 01:00:00       46300       74100
3 2010-01-01 02:00:00       46400       77600
4 2010-01-01 03:00:00       46500       75600
5 2010-01-01 04:00:00       46500       79000
6 2010-01-01 05:00:00       46600       75500

> nrow(subset(new, is.na(discharge.x)))
[1] 4
> nrow(subset(new, is.na(discharge.y)))
[1] 707

Upvotes: 1

Related Questions