Reputation: 29
I have two datasets. The first one is like this:
code | name
115 | A
120 | B
125 | A
130 | C
140 | A
The second one is like this:
code | Year
115 | 2015
140 | 2020
120 | 2017
130 | 2019
125 | 2011
Based on the column "code", I want to find the range of Year for each name like this:
code | Year | Range
115 | 2015 | 9
140 | 2020 | 9
120 | 2017 | 0
130 | 2019 | 0
125 | 2011 | 9
In fact, 9 is 2020-2011
My goal is to write a function with a minimum number of loops to work fast on a large number of data.
Upvotes: 0
Views: 91
Reputation: 78947
left_join
the dataframes by code
group_by
name
max
and min
df <- tribble(
~code, ~name,
115, "A",
120, "B",
125, "A",
130, "C",
140, "A")
df1 <- tribble(
~code, ~Year,
115, 2015,
140, 2020,
120, 2017,
130, 2019,
125, 2011)
df2 <- df1 %>%
left_join(df, by="code") %>%
group_by(name) %>%
mutate(Range = max(Year) - min(Year)) %>%
select(-name)
df2
Output:
code Year name Range
<dbl> <dbl> <chr> <dbl>
1 115 2015 A 9
2 140 2020 A 9
3 120 2017 B 0
4 130 2019 C 0
5 125 2011 A 9
Upvotes: 2
Reputation: 9868
I think you want to first merge the dataframes:
data<-merge(df1, df2, by = 'code')
Then you want the range of the year column, grouped by code:
library(dplyr)
data %>% group_by(name) %>% mutate(Range=diff(range(Year)))
This can all be done in a single call:
library(dplyr)
merge(df1, df2, by = 'code')%>%
group_by(name)%>%
mutate(Range=diff(range(Year)))
code Year name Range
<dbl> <dbl> <chr> <dbl>
1 115 2015 A 9
2 140 2020 A 9
3 120 2017 B 0
4 130 2019 C 0
5 125 2011 A 9
Upvotes: 2