user14808367
user14808367

Reputation:

creating an extra column based on two other dataframes

I have three datasets

one containing a bunch of information about storms.

one that contains full names of the cities and the abbreviations.

and one that contains the year and population for each state.

What I want to do is to add a column to the first dataframe storms called population that contains population per year for each state using the other two dataframes state_codes and states.

Can anyone point me in the right direction? Below some sample data

> head(storms)
  num   yr mo dy     time state magnitude injuries fatalities crop_loss
1   1 1950  1  3 11:00:00    MO         3        3          0         0
2   1 1950  1  3 11:10:00    IL         3        0          0         0
3   2 1950  1  3 11:55:00    IL         3        3          0         0
4   3 1950  1  3 16:00:00    OH         1        1          0         0
5   4 1950  1 13 05:25:00    AR         3        1          1         0
6   5 1950  1 25 19:30:00    MO         2        5          0         0

> head(state_codes)
        Name Abbreviation
1    Alabama           AL
2     Alaska           AK
3    Arizona           AZ
4   Arkansas           AR
5 California           CA
6   Colorado           CO


head(states)
Year Alabama Arizona Arkansas California Colorado Connecticut Delaware
1 1900    1830     124     1314       1490      543         910      185
2 1901    1907     131     1341       1550      581         931      187
3 1902    1935     138     1360       1623      621         952      188
4 1903    1957     144     1384       1702      652         972      190
5 1904    1978     151     1419       1792      659         987      192
6 1905    2012     158     1447       1893      680        1010      194

Upvotes: 0

Views: 25

Answers (2)

John
John

Reputation: 1180

This answer doesn't use dplyr, but I'm offering it because I know that it's very fast on large datasets.

It follows the same first step as the accepted answer: merge state names into the storms dataset. But then it does something clever (I stole the idea): it creates a matrix of row and column numbers, and then uses that to extract the elements from the "states" dataset that you want for the new column.

#Add the state names to storms
storms<-merge(storms, state_codes, by.x = 6, by.y = 2, all.x = T)

#Get row and column indexes for the elements in 'states'
r<-match(storms$year, states$year)
c<-match(storms$state.y, names(states)) #state.y was the name of the merged column
smat<-cbind(r,c)

#And grab them into a new vector
storms$population<-states[smat]

Upvotes: 0

Ian Campbell
Ian Campbell

Reputation: 24790

You didn't provide much data to test with, but this should do it.

First, join storms to state_codes, so that it will have state names that are in states. We can rename yr to match states$Year at the same time.

Then pivot states to be in long form.

Finally, join the new version of storms to the long version of states.

library(dplyr)
library(tidyr)
storms %>%
  left_join(state_codes,by = c("state" = "Abbreviation")) %>%
  rename(Year = yr) -> storms.with.names

states %>%
  pivot_longer(-Year, names_to = "Name",
               values_to = "Population") -> long.states

storms.with.names %>%
  left_join(long.states) -> result

Upvotes: 1

Related Questions