Reputation: 111
I am trying to build a machine learning model to predict VAR1 for each neighborhood across time. Using time series data (Year and month). However, the data contains many neighborhoods (which are the basis of the analysis). So each neighborhood will be repeated 3 yrs * 12 months= 36 times.
I need to merge this data with other datasets. All the other datasets have the same number of areas and have Year but do not have month.
I need help on how to go about joining these datasets together and analysis. I am working in R.
On joining the datasets, I will try to transfer the rows in the other datasets into columns so I would have less instances for each area.
Here's an example of some the head of some of the datasets (in R):
head(df)
Year Month District Neighborhood Gender VAR1
1 2017 January 1 1 M 2000
2 2017 January 1 2 M 350
3 2017 January 1 3 M 700
4 2017 January 1 4 M 400
5 2017 January 2 5 M 1000
6 2017 January 2 6 M 200
tail(df)
Year Month District Neighborhood Gender VAR1
10577 2015 December 10 69 F 200
10578 2015 December 10 70 F 1000
10579 2015 December 10 71 F 500
10580 2015 December 10 72 F 350
10581 2015 December 10 73 F 300
10582 2015 December 99 99 F 770
I need help on two things:
First, I need to know how I can merge the example above with the other datasets which don't have month.
I am stuck on how to go about doing EDA and analysing this dataset and would appreciate help here.
Upvotes: 1
Views: 357