Ashley Thomas
Ashley Thomas

Reputation: 61

Errors when merging two dataframes based on date

This is a follow-up question from what I asked here Merging a dataset based on ID and date (thanks for the helpful advice!)

I have two datasets each that are monthly summaries of different variables for households. I want to merge the two based on the household ID and the month.

df1 one looks like this:

     hh_ids      date total
     <chr>     <chr> <dbl>
1  KELDK13  2013-8-1     1
2  KMOMB02  2013-2-1     1
3  KMOMB02  2013-5-1     2
4  KMOMB04  2013-7-1     2
5  KMOMB04  2013-9-1     1
6  KMOMB06  2013-6-1     1
7  KMOMB14  2013-8-1     1
8  KMOMB16  2013-6-1     1
9  KMOMB17 2012-10-1     1
10 KMOMB17 2012-11-1     2

and the first 10 rows of df2 looks like:

  hh_ids      date    income consumption alcohol cleaning_materials  clothing
1  KELDK01 2012-11-1  62.70588    40.52941       0           0.000000  0.000000
2  KELDK01 2012-12-1  17.64706    42.43530       0           1.058824  7.058824
3  KELDK01 2013-01-1  91.76471    48.23529       0           0.000000  0.000000
4  KELDK01 2013-02-1  91.76470   107.52940       0           0.000000  0.000000
5  KELDK01 2013-03-1 116.47060   114.47060       0           0.000000  0.000000
6  KELDK01 2013-04-1 124.41180   118.29410       0           2.705882 17.647060
7  KELDK01 2013-05-1 137.23530   105.00000       0           1.411765  1.882353
8  KELDK01 2013-06-1 131.52940   109.54120       0           4.352942  2.941176
9  KELDK01 2013-07-1 121.52940   113.47060       0           2.352941 25.882350
10 KELDK01 2013-08-1 123.32940    86.50588       0           2.588235  2.941176

I want the "total" column from to be added on as a column in df2 with the matched hh_ids and date.

I have tried to do the following:

df3<-merge(df2,df1,by=c("hh_ids","date"),all=TRUE) 

but the resultant data.drame df3 now looks like this (i've chopped some columns to make it easier to display):

      hh_ids      date    income consumption total
1    KELDK01 2012-11-1  62.70588    40.52941     0
2    KELDK01 2012-12-1  17.64706    42.43530     0
3    KELDK01 2013-01-1  91.76471    48.23529     0
4    KELDK01 2013-02-1  91.76470   107.52940     0
5    KELDK01 2013-03-1 116.47060   114.47060     0
6    KELDK01 2013-04-1 124.41180   118.29410     0
7    KELDK01 2013-05-1 137.23530   105.00000     0
8    KELDK01 2013-06-1 131.52940   109.54120     0
9    KELDK01 2013-07-1 121.52940   113.47060     0
2595 KNBOM33     15918        NA          NA     1
2596 KNBOM33     15979        NA          NA     1
2597 KNBOM33 2012-10-1  32.94118    18.11765     0
2598 KNBOM33 2012-12-1  56.47059    44.23529     0
2599 KNBOM33 2013-01-1  10.58824    29.76471     0
2600 KNBOM33 2013-02-1  54.70588    70.21176     0

It seems that something has happened to df1's date in the merge, and therefore it didn't merge with the other values in df2. Does anyone have any suggestions as to what I should do? I have double checked that both df1 and df2 have the "date" column formatted as a date before the merge.

Upvotes: 0

Views: 63

Answers (2)

Yong
Yong

Reputation: 1127

It seems the type of "date", where "2013-8-1" is different from "2013-08-1".

before merge them:

df1$date = as.Date(df1$date);
df2$date = as.Date(df2$date);

Upvotes: 1

Justin SJ Lee
Justin SJ Lee

Reputation: 26

I'd start off by checknig the rows with hh_ids == "KNB0M33" to see if there is anything funny going on in the dates, income and consumption columns.

Upvotes: 0

Related Questions