shokoufa zeynali
shokoufa zeynali

Reputation: 29

Is there any short way in R to find min Max based on below data?

I have two datasets. The first one is like this:

code | name
115  | A
120  | B
125  | A
130  | C
140  | A

The second one is like this:

code | Year
115  | 2015
140  | 2020
120  | 2017
130  | 2019
125  | 2011

Based on the column "code", I want to find the range of Year for each name like this:

code | Year | Range
115  | 2015 | 9
140  | 2020 | 9
120  | 2017 | 0
130  | 2019 | 0
125  | 2011 | 9

In fact, 9 is 2020-2011

My goal is to write a function with a minimum number of loops to work fast on a large number of data.

Upvotes: 0

Views: 91

Answers (2)

TarJae
TarJae

Reputation: 78947

  1. left_join the dataframes by code
  2. group_by name
  3. use max and min
df <- tribble(
~code, ~name,
115, "A",
120, "B",
125, "A",
130, "C",
140, "A")

df1 <- tribble(
~code, ~Year,
115, 2015,
140, 2020,
120, 2017,
130, 2019,
125, 2011)

df2 <- df1 %>% 
  left_join(df, by="code") %>% 
  group_by(name) %>% 
  mutate(Range = max(Year) - min(Year)) %>%
  select(-name)
df2

Output:

   code  Year name  Range
  <dbl> <dbl> <chr> <dbl>
1   115  2015 A         9
2   140  2020 A         9
3   120  2017 B         0
4   130  2019 C         0
5   125  2011 A         9

Upvotes: 2

GuedesBF
GuedesBF

Reputation: 9868

I think you want to first merge the dataframes:

data<-merge(df1, df2, by = 'code')

Then you want the range of the year column, grouped by code:

library(dplyr)

data %>% group_by(name) %>% mutate(Range=diff(range(Year)))

This can all be done in a single call:

library(dplyr)

merge(df1, df2, by = 'code')%>%
group_by(name)%>%
mutate(Range=diff(range(Year)))

   code  Year name  Range
  <dbl> <dbl> <chr> <dbl>
1   115  2015 A         9
2   140  2020 A         9
3   120  2017 B         0
4   130  2019 C         0
5   125  2011 A         9

Upvotes: 2

Related Questions