Roney Wesley Galan
Roney Wesley Galan

Reputation: 105

Add multiple selects in one dataset

I have the dataset below and in it I consolidate the categories Mk_Cap, Exports and Money_Supply, but each of these grids has a different Unit.

 df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
               "Jan","Fev","Mar","Abr","Mai",
               "Jan","Fev","Mar","Abr","Mai"),
         Ano=c(2005,2006,2007,2008,2009,
               2005,2006,2007,2008,2009,
               2005,2006,2007,2008,2009),
         Mk_Cap=c(11:15,116:120,1111:1115), 
         Exports=c(21:25,146:150,1351:1355),
         Money_Supply=c(31:35,546:550,2111:2115),
         Unit=c("USD","USD","USD","USD","USD","200=10",
                "200=10","200=10","200=10","200=10",
                "CNY","CNY","CNY","CNY","CNY"))

enter image description here

Today I am consolidating as follows:

library(dplyr)
Money_Supply <- df %>% dplyr::select(Ano, Mes,Money_Supply) %>% dplyr::filter(df$Unit == "USD")
Mk_Cap <- df %>% dplyr::select(Mk_Cap) %>% dplyr::filter(df$Unit == "200=10")
Exports <- df %>% dplyr::select(Exports) %>% dplyr::filter(df$Unit == "CNY")

Consolidado <- base::cbind(Money_Supply,Mk_Cap,Exports)

enter image description here

I believe that it is not the most correct way to do this, but today it is the way that I found, in this example that I passed there are few occurrences, but in the practical case I do this in more than 30 variables which is extremely costly, if there is any way easier would be ideal.

Upvotes: 0

Views: 53

Answers (2)

TarJae
TarJae

Reputation: 78927

A solution with dplyr: There is a pattern in the dataframe. Each year has three rows. Of the three column of interest Money_Supply, Mk_Cap, Exports each variable is in the first, second or third row. First reorder the columns, then arrange by year, then lead the columns of interest. Then group and filter by id==1.

df1 <- df %>%
  select(Ano, Mes, Money_Supply, Mk_Cap, Exports) %>% 
  arrange(Ano) %>% 
  mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>% 
  mutate(Exports = lead(Exports, 2, order_by = Ano)) %>% 
  mutate(group = rep(row_number(), each=3, length.out = n())) %>% 
  group_by(group) %>% 
  mutate(id = row_number()) %>% 
  filter(id ==1) %>%
  ungroup() %>% 
  select(-group, -id)

Data

df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
                       "Jan","Fev","Mar","Abr","Mai",
                       "Jan","Fev","Mar","Abr","Mai"),
                 Ano=c(2005,2006,2007,2008,2009,
                       2005,2006,2007,2008,2009,
                       2005,2006,2007,2008,2009),
                 Mk_Cap=c(11:15,116:120,1111:1115), 
                 Exports=c(21:25,146:150,1351:1355),
                 Money_Supply=c(31:35,546:550,2111:2115),
                 Unit=c("USD","USD","USD","USD","USD","200=10",
                        "200=10","200=10","200=10","200=10",
                        "CNY","CNY","CNY","CNY","CNY"))

Edit: Try to clarify my point and the simplicity of the pattern in the data:

# slightly simplified code
df1 <- df %>% 
  arrange(Ano) %>% 
  mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>% 
  mutate(Exports = lead(Exports, 2, order_by = Ano)) %>% 
  group_by(Ano) %>% 
  mutate(id = row_number()) %>% 
  filter(id ==1) %>%
  ungroup() %>%
  select(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)

If you consider your dataframe like Fig1 with arrange(Ano): enter image description here

  1. You have 5 Ano (orange): 2005-2009
  2. In each Ano you have 1 Mes(purple): In 2005 = Jan, 2006 = Fev, 2007 = Mar, 2008 = Abr, 2009 = Mai
  3. In each Ano and Mes you have 3 Unit (blue): In 2005 & Jan = USD, 200=10, CNY ; In 2006 & Fev = USD, 200=10, CNY ; etc...

In your desired output you wish to have:

to condense the 3 rows of one Ano with 3 different Unit to 1 row with Ano, Mes and the corresponding values of Money_Supply, Mk_Cap, Exports

This can be achieved by lead function (see Fig.1):

  1. In Money_Supply: no code necessary is already in the first row (color green)
  2. In Mk_Cap: mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) yellow arrow
  3. In Exports: mutate(Exports = lead(Exports, 2, order_by = Ano)) red arrow
  4. group_by(Ano) Group by Ano
  5. mutate(id = row_number()) Assign unique id within each group
  6. filter(id ==1) Filter the 1 row in each group
  7. Finally tweak the order of columns and remove unnesseccary columns. select(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)

enter image description here

Upvotes: 2

eduardokapp
eduardokapp

Reputation: 1751

I think a simple way would be filtering your dataset by the Unit column before doing any other operations. Store the variations in a list by performing:

unit_variations <- lapply(unique(df$Unit), function(x) {
    return(df[df$Unit == x, ])
})
names(unit_variations) <- unique(df$Unit)

Then, to make your Consolidado dataframe, select which variables you want from which unit variations. Say:

vars <- c("Money_Supply", "Mk_Cap", "Exports")
unit <- c("USD", "200=10", "CNY")

Consolidado <- mapply(
    FUN = function(var, unit) {
        return(unit_variations[[unit]][[var]])
    },
    vars,
    unit
)

I used a list because, from what you described, I cannot assume that the number of rows for each type of Unit will always be the same, so a list allows for more flexibility. I also did not include month and year, for the same reason.

Upvotes: 1

Related Questions