Reputation:
I have three datasets
one containing a bunch of information about storms.
one that contains full names of the cities and the abbreviations.
and one that contains the year and population for each state.
What I want to do is to add a column to the first dataframe storms called population that contains population per year for each state using the other two dataframes state_codes and states.
Can anyone point me in the right direction? Below some sample data
> head(storms)
num yr mo dy time state magnitude injuries fatalities crop_loss
1 1 1950 1 3 11:00:00 MO 3 3 0 0
2 1 1950 1 3 11:10:00 IL 3 0 0 0
3 2 1950 1 3 11:55:00 IL 3 3 0 0
4 3 1950 1 3 16:00:00 OH 1 1 0 0
5 4 1950 1 13 05:25:00 AR 3 1 1 0
6 5 1950 1 25 19:30:00 MO 2 5 0 0
> head(state_codes)
Name Abbreviation
1 Alabama AL
2 Alaska AK
3 Arizona AZ
4 Arkansas AR
5 California CA
6 Colorado CO
head(states)
Year Alabama Arizona Arkansas California Colorado Connecticut Delaware
1 1900 1830 124 1314 1490 543 910 185
2 1901 1907 131 1341 1550 581 931 187
3 1902 1935 138 1360 1623 621 952 188
4 1903 1957 144 1384 1702 652 972 190
5 1904 1978 151 1419 1792 659 987 192
6 1905 2012 158 1447 1893 680 1010 194
Upvotes: 0
Views: 25
Reputation: 1180
This answer doesn't use dplyr, but I'm offering it because I know that it's very fast on large datasets.
It follows the same first step as the accepted answer: merge state names into the storms dataset. But then it does something clever (I stole the idea): it creates a matrix of row and column numbers, and then uses that to extract the elements from the "states" dataset that you want for the new column.
#Add the state names to storms
storms<-merge(storms, state_codes, by.x = 6, by.y = 2, all.x = T)
#Get row and column indexes for the elements in 'states'
r<-match(storms$year, states$year)
c<-match(storms$state.y, names(states)) #state.y was the name of the merged column
smat<-cbind(r,c)
#And grab them into a new vector
storms$population<-states[smat]
Upvotes: 0
Reputation: 24790
You didn't provide much data to test with, but this should do it.
First, join storms
to state_codes
, so that it will have state names that are in states
. We can rename yr
to match states$Year
at the same time.
Then pivot states
to be in long form.
Finally, join the new version of storms
to the long version of states
.
library(dplyr)
library(tidyr)
storms %>%
left_join(state_codes,by = c("state" = "Abbreviation")) %>%
rename(Year = yr) -> storms.with.names
states %>%
pivot_longer(-Year, names_to = "Name",
values_to = "Population") -> long.states
storms.with.names %>%
left_join(long.states) -> result
Upvotes: 1