Reputation: 6118
I am trying to add the columns of two dataframes. The dataframes comprise the three year data from 2010 to 2012. While downloading I found that the two datasets are of unequal length. It means that some of the data is missing. Now, I want to make these datasets of equal length and fill in the missing data first using linear interpolation. After that I could then combine the datasets.
The code I used is as follows. It downloads the datasets on the computer and read those data sets.
#######################################################################
# Download Data for station 1
#######################################################################
url <- "http://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=&begin_date=2010-01-01&end_date=2012-12-31&site_no=02469761"
download.file(url,destfile="ville-discharge.txt")
## Download Date
##"Wed Jun 05 22:19:36 2013"
#######################################################################
## Read the ville data
#######################################################################
ville <- read.table("ville-discharge.txt",header=F,skip=23,sep="\t")
head(ville)
villedis <- ville[c(-1,-2),c(-1,-2,-4,-6)]
names(villedis) <- c("date","discharge")
villedis$date <- strptime(villedis$date,format="%Y-%m-%d %H:%M")
villedis$discharge <- as.numeric(as.character(villedis$discharge))
#########################################################################
## Download the borne data
########################################################################
url1 <- "http://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=&begin_date=2010-01-01&end_date=2012-12-31&site_no=02428400"
download.file(url=url1,destfile="bone-discharge.txt")
## Download date and time
# [1] "Wed Jun 05 22:21:45 2013"
########################################################################
## Read the borne data
########################################################################
borne <- read.table("bone-discharge.txt",header=F,sep="\t")
bornedis <- borne[c(-1,-2),c(-1,-2,-4,-6)]
names(bornedis)<- c("date","discharge")
bornedis$date <- strptime(bornedis$date,format="%Y-%m-%d %H:%M")
bornedis$discharge <- as.numeric(as.character(bornedis$discharge))
> dim(villedis)
[1] 26296 2
> dim(bornedis)
[1] 25593 2
Now I need to add the "discharge" column of two dataframes villedis
and bornedis
. It seems bornedis
is missing some data. What would be the efficient way to add these data frames and put values instead of NA on the missing values ?
Upvotes: 1
Views: 1540
Reputation: 32466
You could use 'merge' to merge the two datasets by date.
new <- merge(villedis, bornedis, by= c("date"), all=TRUE)
And it looks like
> dim(new)
[1] 26306 3
> head(new)
date discharge.x discharge.y
1 2010-01-01 00:00:00 46200 78500
2 2010-01-01 01:00:00 46300 74100
3 2010-01-01 02:00:00 46400 77600
4 2010-01-01 03:00:00 46500 75600
5 2010-01-01 04:00:00 46500 79000
6 2010-01-01 05:00:00 46600 75500
> nrow(subset(new, is.na(discharge.x)))
[1] 4
> nrow(subset(new, is.na(discharge.y)))
[1] 707
Upvotes: 1