Add multiple selects in one dataset

Question

I have the dataset below and in it I consolidate the categories Mk_Cap, Exports and Money_Supply, but each of these grids has a different Unit.

 df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
               "Jan","Fev","Mar","Abr","Mai",
               "Jan","Fev","Mar","Abr","Mai"),
         Ano=c(2005,2006,2007,2008,2009,
               2005,2006,2007,2008,2009,
               2005,2006,2007,2008,2009),
         Mk_Cap=c(11:15,116:120,1111:1115), 
         Exports=c(21:25,146:150,1351:1355),
         Money_Supply=c(31:35,546:550,2111:2115),
         Unit=c("USD","USD","USD","USD","USD","200=10",
                "200=10","200=10","200=10","200=10",
                "CNY","CNY","CNY","CNY","CNY"))

enter image description here

Today I am consolidating as follows:

library(dplyr)
Money_Supply <- df %>% dplyr::select(Ano, Mes,Money_Supply) %>% dplyr::filter(df$Unit == "USD")
Mk_Cap <- df %>% dplyr::select(Mk_Cap) %>% dplyr::filter(df$Unit == "200=10")
Exports <- df %>% dplyr::select(Exports) %>% dplyr::filter(df$Unit == "CNY")

Consolidado <- base::cbind(Money_Supply,Mk_Cap,Exports)

enter image description here

I believe that it is not the most correct way to do this, but today it is the way that I found, in this example that I passed there are few occurrences, but in the practical case I do this in more than 30 variables which is extremely costly, if there is any way easier would be ideal.

TarJae · Accepted Answer

A solution with dplyr: There is a pattern in the dataframe. Each year has three rows. Of the three column of interest Money_Supply, Mk_Cap, Exports each variable is in the first, second or third row. First reorder the columns, then arrange by year, then lead the columns of interest. Then group and filter by id==1.

df1 <- df %>%
  select(Ano, Mes, Money_Supply, Mk_Cap, Exports) %>% 
  arrange(Ano) %>% 
  mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>% 
  mutate(Exports = lead(Exports, 2, order_by = Ano)) %>% 
  mutate(group = rep(row_number(), each=3, length.out = n())) %>% 
  group_by(group) %>% 
  mutate(id = row_number()) %>% 
  filter(id ==1) %>%
  ungroup() %>% 
  select(-group, -id)

Data

df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
                       "Jan","Fev","Mar","Abr","Mai",
                       "Jan","Fev","Mar","Abr","Mai"),
                 Ano=c(2005,2006,2007,2008,2009,
                       2005,2006,2007,2008,2009,
                       2005,2006,2007,2008,2009),
                 Mk_Cap=c(11:15,116:120,1111:1115), 
                 Exports=c(21:25,146:150,1351:1355),
                 Money_Supply=c(31:35,546:550,2111:2115),
                 Unit=c("USD","USD","USD","USD","USD","200=10",
                        "200=10","200=10","200=10","200=10",
                        "CNY","CNY","CNY","CNY","CNY"))

Edit: Try to clarify my point and the simplicity of the pattern in the data:

# slightly simplified code
df1 <- df %>% 
  arrange(Ano) %>% 
  mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>% 
  mutate(Exports = lead(Exports, 2, order_by = Ano)) %>% 
  group_by(Ano) %>% 
  mutate(id = row_number()) %>% 
  filter(id ==1) %>%
  ungroup() %>%
  select(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)

If you consider your dataframe like Fig1 with arrange(Ano):

You have 5 Ano (orange): 2005-2009
In each Ano you have 1 Mes(purple): In 2005 = Jan, 2006 = Fev, 2007 = Mar, 2008 = Abr, 2009 = Mai
In each Ano and Mes you have 3 Unit (blue): In 2005 & Jan = USD, 200=10, CNY ; In 2006 & Fev = USD, 200=10, CNY ; etc...

In your desired output you wish to have:

to condense the 3 rows of one Ano with 3 different Unit to 1 row with Ano, Mes and the corresponding values of Money_Supply, Mk_Cap, Exports

This can be achieved by lead function (see Fig.1):

In Money_Supply: no code necessary is already in the first row (color green)
In Mk_Cap: mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) yellow arrow
In Exports: mutate(Exports = lead(Exports, 2, order_by = Ano)) red arrow
group_by(Ano) Group by Ano
mutate(id = row_number()) Assign unique id within each group
filter(id ==1) Filter the 1 row in each group
Finally tweak the order of columns and remove unnesseccary columns. select(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)

Add multiple selects in one dataset

Answers (2)

Related Questions