Reputation: 1
I am trying to write a loop that would search for the right date in the data.frame B
(date_B[j]
) and would copy the related value X_B[j]
into the X_A[i]
variable related to the same date date_A[i]
.
The challenge is that a) the target data.frame A
has several of the same dates but b) not systematically all the dates that the data.frame (B) has. The (B) includes all the needed dates though. Consequently, the data frames are of different lengths.
Questions:
The data frames are:
A =
date_A X_A
1 2010-01-01
2 2010-01-02
3 2010-01-03
4 2010-01-02
5 2010-02-03
6 2010-01-01
7 2010-01-02
.
.
.
20000
B=
date_B X_B
1 2010-01-01 7.9
2 2010-01-02 8.5
3 2010-01-03 2.1
.
.
400
My goal is:
A=
date_A X_A
1 2010-01-01 7.9
2 2010-01-02 8.5
3 2010-01-03 2.1
4 2010-01-02 8.5
5 2010-02-03 2.1
6 2010-01-01 7.9
7 2010-01-02 8.5
I wrote the following loop, but for some reason it does not find its way past the first row. In other words, it does not change the values of the other X_A
cells, although the loop keeps running endlessly.
i=1; j=1;
while (i <= nrow(A))
while (j <= nrow(B)) {
if (A$date_A[i]==B$date_B[j]) A$X_A[i] <- B$X_B[j];
j <- j+1; if (j == nrow(B))
i <- i+1;
j=1
}
Thanks for your help.
Upvotes: 0
Views: 290
Reputation: 11956
Wow! Your code scares me. At the very least, use a for loop for this kind of thing (although @Dwin's solution is the way to go for this problem):
for(i in seq(nrow(A)))
{
for(j in seq(nrow(B)))
{
if(A$date_A[i]==B$date_B[j])
{
A$X_A[i] <- B$X_B[j]
}
}
}
This will prevent all the ugliness with manually trying to do the increments at the end of your while loops (in your own code, the j=1 needed to be moved outside the inner brackets, by the way).
Note: this code, as yours, does not solve the issue when B contains two rows with the same date as in A (it will always use the value of the last row in B for that date). It serves to help you understand for instead of while for simple incremental loops.
Upvotes: 2
Reputation: 47551
With this data:
A <- data.frame( date_A = c('2010-01-01', '2010-01-02', '2010-01-03', '2010-01-02',
'2010-02-03', '2010-01-01', '2010-01-02') )
B <- data.frame(
date_B = c('2010-01-01','2010-01-02','2010-01-03'),
X_B = c(7.9,8.5,2.1))
You can use match()
to index the X_B
values in the right order:
A$X_A <- B$X_B[match(A$date_A,B$date_B)]
match()
returns the indexes of the locations of B$date_B
in A$date_A
. Another trick to use is to use the levels of the factor as index:
A$X_A <- B$X_B[A$date_A]
which works because each factor has levels in sorted order and correspond to numeric values (1,2,3...). So if B
is sorted according to these levels this returns the correct indexes. (you might want to sort B
to be sure: B <- B[order(B$date_B),]
)
As for why the loop doesn't work. First, I think you really don't want to use ;
in R scripts ever. It makes code so much harder to read. Best is if you learn to write clear code. In your code you can use assigners more consistent and use proper indenting. For example:
i <- 1
j <- 1
while (i <= nrow(A))
{
while (j <= nrow(B))
{
if (A$date_A[i]==B$date_B[j]) A$X_A[i] <- B$X_B[j]
j <- j+1
if (j == nrow(B)) i <- i+1
j <- 1
}
}
This is your code, but it is much clearer to read. For me this does not run because the levels are not comparible (due to the typo) so I put in an as.character()
call. This is probably not needed in the real dataset.
Indexing immediately shows the biggest problem here: You have misplaced j <- 1
outside the if (j == nrow(B))
part. Using ;
terminates the line and thus the conditional part. Because of this j
is set to 1 in each loop.
Changing that makes it run better, but you still get an error because the while
loop for j
might not finish before i
is larger then the number of rows in A
. This can be changed by setting an AND
statement and collapsing both while
loops in one. Finally you need to set the if
statement to larger then the number of rows in B
or you omit one row. This should work:
i <- 1
j <- 1
while (j <= nrow(B) & i <= nrow(A))
{
if (as.character(A$date_A[i])==as.character(B$date_B[j])) A$X_A[i] <- B$X_B[j]
j <- j+1
if (j > nrow(B))
{
i <- i+1
j <- 1
}
}
But this is only meant to show what went wrong, I'd never recommended doing something like this this way. Even when you really want to use loops you are probably better of with for
loops.
Upvotes: 2
Reputation: 263332
With this sort of problem merge
makes it much easier. With your example I do not get a match with the seventh row but perhaps you had a typo. My A dataframe only had the date_A column. If you want to rename the X_B column, then the names()<-
will do it easily;
merge(A, B, by.x=1, by.y=1, all.x=TRUE)
#---result---
date_A X_B
1 2010-01-01 7.9
2 2010-01-01 7.9
3 2010-01-02 8.5
4 2010-01-02 8.5
5 2010-01-02 8.5
6 2010-01-03 2.1
7 2010-02-03 NA
Upvotes: 7