Reputation: 105
I have the dataset below and in it I consolidate the categories Mk_Cap, Exports and Money_Supply, but each of these grids has a different Unit.
df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
"Jan","Fev","Mar","Abr","Mai",
"Jan","Fev","Mar","Abr","Mai"),
Ano=c(2005,2006,2007,2008,2009,
2005,2006,2007,2008,2009,
2005,2006,2007,2008,2009),
Mk_Cap=c(11:15,116:120,1111:1115),
Exports=c(21:25,146:150,1351:1355),
Money_Supply=c(31:35,546:550,2111:2115),
Unit=c("USD","USD","USD","USD","USD","200=10",
"200=10","200=10","200=10","200=10",
"CNY","CNY","CNY","CNY","CNY"))
Today I am consolidating as follows:
library(dplyr)
Money_Supply <- df %>% dplyr::select(Ano, Mes,Money_Supply) %>% dplyr::filter(df$Unit == "USD")
Mk_Cap <- df %>% dplyr::select(Mk_Cap) %>% dplyr::filter(df$Unit == "200=10")
Exports <- df %>% dplyr::select(Exports) %>% dplyr::filter(df$Unit == "CNY")
Consolidado <- base::cbind(Money_Supply,Mk_Cap,Exports)
I believe that it is not the most correct way to do this, but today it is the way that I found, in this example that I passed there are few occurrences, but in the practical case I do this in more than 30 variables which is extremely costly, if there is any way easier would be ideal.
Upvotes: 0
Views: 53
Reputation: 78927
A solution with dplyr:
There is a pattern in the dataframe. Each year has three rows.
Of the three column of interest Money_Supply, Mk_Cap, Exports
each variable is in the first, second or third row.
First reorder the columns, then arrange by year, then lead the columns of interest. Then group and filter by id==1
.
df1 <- df %>%
select(Ano, Mes, Money_Supply, Mk_Cap, Exports) %>%
arrange(Ano) %>%
mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>%
mutate(Exports = lead(Exports, 2, order_by = Ano)) %>%
mutate(group = rep(row_number(), each=3, length.out = n())) %>%
group_by(group) %>%
mutate(id = row_number()) %>%
filter(id ==1) %>%
ungroup() %>%
select(-group, -id)
Data
df <- data.frame(Mes=c("Jan","Fev","Mar","Abr","Mai",
"Jan","Fev","Mar","Abr","Mai",
"Jan","Fev","Mar","Abr","Mai"),
Ano=c(2005,2006,2007,2008,2009,
2005,2006,2007,2008,2009,
2005,2006,2007,2008,2009),
Mk_Cap=c(11:15,116:120,1111:1115),
Exports=c(21:25,146:150,1351:1355),
Money_Supply=c(31:35,546:550,2111:2115),
Unit=c("USD","USD","USD","USD","USD","200=10",
"200=10","200=10","200=10","200=10",
"CNY","CNY","CNY","CNY","CNY"))
Edit: Try to clarify my point and the simplicity of the pattern in the data:
# slightly simplified code
df1 <- df %>%
arrange(Ano) %>%
mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano)) %>%
mutate(Exports = lead(Exports, 2, order_by = Ano)) %>%
group_by(Ano) %>%
mutate(id = row_number()) %>%
filter(id ==1) %>%
ungroup() %>%
select(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)
If you consider your dataframe like Fig1 with arrange(Ano)
:
Ano
(orange): 2005-2009Ano
you have 1 Mes
(purple): In 2005 = Jan, 2006 = Fev, 2007 = Mar, 2008 = Abr, 2009 = MaiAno
and Mes
you have 3 Unit
(blue): In 2005 & Jan = USD, 200=10, CNY ; In 2006 & Fev = USD, 200=10, CNY ; etc...In your desired output you wish to have:
to condense the
3 rows of one Ano
with 3 different Unit
to
1 row with Ano
, Mes
and the corresponding values of Money_Supply, Mk_Cap, Exports
This can be achieved by lead
function (see Fig.1):
Money_Supply
: no code necessary is already in the first row (color green)Mk_Cap
: mutate(Mk_Cap = lead(Mk_Cap, order_by = Ano))
yellow arrowExports
: mutate(Exports = lead(Exports, 2, order_by = Ano))
red arrowgroup_by(Ano)
Group by Ano
mutate(id = row_number())
Assign unique id
within each groupfilter(id ==1)
Filter the 1 row in each groupselect(Ano, Mes, Money_Supply, Mk_Cap, Exports, -id, -Unit)
Upvotes: 2
Reputation: 1751
I think a simple way would be filtering your dataset by the Unit
column before doing any other operations. Store the variations in a list by performing:
unit_variations <- lapply(unique(df$Unit), function(x) {
return(df[df$Unit == x, ])
})
names(unit_variations) <- unique(df$Unit)
Then, to make your Consolidado
dataframe, select which variables you want from which unit variations. Say:
vars <- c("Money_Supply", "Mk_Cap", "Exports")
unit <- c("USD", "200=10", "CNY")
Consolidado <- mapply(
FUN = function(var, unit) {
return(unit_variations[[unit]][[var]])
},
vars,
unit
)
I used a list because, from what you described, I cannot assume that the number of rows for each type of Unit
will always be the same, so a list allows for more flexibility. I also did not include month and year, for the same reason.
Upvotes: 1