Reputation: 477
I have following Dataframe in R.
Type Category Mar-20 Apr-20 Feb-20
A RTSF 25 (25.00%) 30 (30.00%) 50 (50.00%)
A PRRT 30 (30.00%) 30 (30.00%) 20 (20.00%)
A ATTR 15 (15.00%) 30 (30.00%) 15 (15.00%)
A STRF 30 (30.00%) 10 (10.00%) 15 (15.00%)
B RTSF 20 (20.00%) 25 (38.46%) 50 (50.00%)
B PRRT 40 (40.00%) 5 (7.69%) 0 (0.00%)
B ATTR 10 (10.00%) 25 (38.46%) 0 (0.00%)
B STRF 30 (30.00%) 10 (15.38%) 50 (50.00%)
I want to add the Total
column for every Type
and arrange the Months in the ascending order from left to right.
Dput(df)<-
structure(list(Type = c("A", "A", "A", "A",
"B", "B", "B", "B"), category = c("RTSF",
"PRRT", "ATTR", "STRF", "RTSF",
"PRRT", "ATTR", "STRF"), `Apr-21` = c("1544 (47.7%)",
"330 (10.2%)", "771 (23.8%)", "589 (18.2%)", "60 (2.38%)", "2448 (97.30%)",
"0 (0.00%)", "8 (0.32%)"), `Dec-20` = c("2547 (50.93%)", "15 (0.30%)",
"2435 (48.69%)", "4 (0.08%)", "680 (97.8%)", "0 (0.0%)", "15 (2.2%)",
"0 (0.0%)"), `Feb-21` = c("2099 (50.24%)", "14 (0.34%)", "2064 (49.40%)",
"1 (0.02%)", "1702 (98.44%)", "19 (1.10%)", "7 (0.40%)", "1 (0.06%)"
), `Jan-21` = c("2084 (46.71%)", "20 (0.45%)", "2352 (52.71%)",
"6 (0.13%)", "912 (99.35%)", "1 (0.11%)", "5 (0.54%)", "0 (0.00%)"
), `Mar-21` = c("1111 (33.5%)", "575 (17.3%)", "876 (26.4%)",
"753 (22.7%)", "198 (7.8%)", "2239 (88.6%)", "1 (0.0%)", "90 (3.6%)"
)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
REQUIRED OUTPUT:
Type Category Feb-20 Mar-20 Apr-20
A RTSF 50 (50.00%) 25 (25.00%) 30 (30.00%)
A PRRT 20 (20.00%) 30 (30.00%) 30 (30.00%)
A ATTR 15 (15.00%) 15 (15.00%) 30 (30.00%)
A STRF 15 (15.00%) 30 (30.00%) 10 (10.00%)
Total - 100 (100.00%) 100 (100.00%) 100 (100.00%)
B RTSF 50 (50.00%) 20 (20.00%) 25 (38.46%)
B PRRT 0 (0.00%) 40 (40.00%) 5 (7.69%)
B ATTR 0 (0.00%) 10 (10.00%) 25 (38.46%)
B STRF 50 (50.00%) 30 (30.00%) 10 (15.38%)
Total - 100 (100.00%) 100 (100.00%) 65 (100.00%)
Upvotes: 1
Views: 417
Reputation: 78927
Here is another working solution:
Logic:
clean_names
with janitor packageseparate
counts from percentmap_df
from purrr
with adorn_totals
adorn_totals
library(tidyverse)
library(janitor)
df %>%
clean_names() %>%
pivot_longer(
cols = 3:7,
names_to = "Date",
values_to = "Values"
) %>%
separate(Values, c("Values_N","Values_Percent"), sep=" ") %>%
pivot_wider(
names_from = Date,
values_from = c(Values_N, Values_Percent)
) %>%
type.convert(as.is=TRUE) %>%
split( .[,"type"] ) %>%
purrr::map_df(., janitor::adorn_totals) %>%
select(Type=type, Category=category,
`Dez-20_Count`=Values_N_dec_20,
`Dez-20_Percent`=Values_Percent_dec_20,
`Jan-21_Count`=Values_N_jan_21,
`Jan-21_Percent`=Values_Percent_jan_21,
`Feb-21_Count`=Values_N_feb_21,
`Feb-21_Percent`=Values_Percent_feb_21,
`Mar-21_Count`=Values_N_mar_21,
`Mar-21_Percent`=Values_Percent_mar_21,
`Apr-21_Count`=Values_N_apr_21,
`Apr-21_Percent`=Values_Percent_apr_21)
Output:
Type Category Dez-20_Count Dez-20_Percent Jan-21_Count Jan-21_Percent Feb-21_Count Feb-21_Percent Mar-21_Count Mar-21_Percent Apr-21_Count Apr-21_Percent
A RTSF 2547 (50.93%) 2084 (46.71%) 2099 (50.24%) 1111 (33.5%) 1544 (47.7%)
A PRRT 15 (0.30%) 20 (0.45%) 14 (0.34%) 575 (17.3%) 330 (10.2%)
A ATTR 2435 (48.69%) 2352 (52.71%) 2064 (49.40%) 876 (26.4%) 771 (23.8%)
A STRF 4 (0.08%) 6 (0.13%) 1 (0.02%) 753 (22.7%) 589 (18.2%)
Total - 5001 - 4462 - 4178 - 3315 - 3234 -
B RTSF 680 (97.8%) 912 (99.35%) 1702 (98.44%) 198 (7.8%) 60 (2.38%)
B PRRT 0 (0.0%) 1 (0.11%) 19 (1.10%) 2239 (88.6%) 2448 (97.30%)
B ATTR 15 (2.2%) 5 (0.54%) 7 (0.40%) 1 (0.0%) 0 (0.00%)
B STRF 0 (0.0%) 0 (0.00%) 1 (0.06%) 90 (3.6%) 8 (0.32%)
Total - 695 - 918 - 1729 - 2528 - 2516 -
Upvotes: 0
Reputation: 508
You need some sort of string-parsing function that also handles the aggregation:
library(tidyverse)
data <- structure(list(Type = c("A", "A", "A", "A",
"B", "B", "B", "B"), category = c("RTSF",
"PRRT", "ATTR", "STRF", "RTSF",
"PRRT", "ATTR", "STRF"), `Apr-21` = c("1544 (47.7%)",
"330 (10.2%)", "771 (23.8%)", "589 (18.2%)", "60 (2.38%)", "2448 (97.30%)",
"0 (0.00%)", "8 (0.32%)"), `Dec-20` = c("2547 (50.93%)", "15 (0.30%)",
"2435 (48.69%)", "4 (0.08%)", "680 (97.8%)", "0 (0.0%)", "15 (2.2%)",
"0 (0.0%)"), `Feb-21` = c("2099 (50.24%)", "14 (0.34%)", "2064 (49.40%)",
"1 (0.02%)", "1702 (98.44%)", "19 (1.10%)", "7 (0.40%)", "1 (0.06%)"
), `Jan-21` = c("2084 (46.71%)", "20 (0.45%)", "2352 (52.71%)",
"6 (0.13%)", "912 (99.35%)", "1 (0.11%)", "5 (0.54%)", "0 (0.00%)"
), `Mar-21` = c("1111 (33.5%)", "575 (17.3%)", "876 (26.4%)",
"753 (22.7%)", "198 (7.8%)", "2239 (88.6%)", "1 (0.0%)", "90 (3.6%)"
)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
parse_field <- function(s) {
res <- str_match(s, "([0-9]+) \\(([0-9]+\\.[0-9]+)%\\)")
x <- as.numeric(res[, 2])
y <- as.numeric(res[, 3])
sprintf("%d (%02.f%%)", sum(x), sum(y))
}
cnames <- data %>%
select(-Type, -category) %>%
names()
totals <- data %>%
as_tibble() %>%
group_by(Type) %>%
summarize_at(cnames, parse_field) %>%
mutate(tmpType = Type, category = "-", Type = "Total")
combined <- data %>%
mutate(tmpType = Type) %>%
bind_rows(totals) %>%
group_by(tmpType) %>%
arrange(Type, .by_group = TRUE) %>%
ungroup() %>%
select(-tmpType)
sorted_names <- cnames[order(as.Date(paste("01", cnames), "%d %b-%y"))]
select(combined, Type, category, sorted_names)
output:
# A tibble: 10 x 7
Type category `Dec-20` `Jan-21` `Feb-21` `Mar-21` `Apr-21`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A RTSF 2547 (50.93… 2084 (46.71… 2099 (50.24… 1111 (33.5… 1544 (47.7…
2 A PRRT 15 (0.30%) 20 (0.45%) 14 (0.34%) 575 (17.3%) 330 (10.2%)
3 A ATTR 2435 (48.69… 2352 (52.71… 2064 (49.40… 876 (26.4%) 771 (23.8%)
4 A STRF 4 (0.08%) 6 (0.13%) 1 (0.02%) 753 (22.7%) 589 (18.2%)
5 Total - 5001 (100%) 4462 (100%) 4178 (100%) 3315 (100%) 3234 (100%)
6 B RTSF 680 (97.8%) 912 (99.35%) 1702 (98.44… 198 (7.8%) 60 (2.38%)
7 B PRRT 0 (0.0%) 1 (0.11%) 19 (1.10%) 2239 (88.6… 2448 (97.3…
8 B ATTR 15 (2.2%) 5 (0.54%) 7 (0.40%) 1 (0.0%) 0 (0.00%)
9 B STRF 0 (0.0%) 0 (0.00%) 1 (0.06%) 90 (3.6%) 8 (0.32%)
10 Total - 695 (100%) 918 (100%) 1729 (100%) 2528 (100%) 2516 (100%)
Explanation
parse_field
will match the leading number and the number in the parentheses. Since str_match
is vectorized and the argument s
is a vector representing one of the columns, it will return a matrix which can then be summed and reformatted using sprintf
.Type
column, and aggregate within each group using parse_field
for each column you care about. Change the category to a dash and Type
to "Total" since that's how you want these displayed in the final output. Also make a new column holding the original type for step 3 called tmpType
.tmpType
to hold the type analogously to what was done at the end of 2. Append the totals to the original data (at this point they are at the bottom so we need to sort). Group by tmpType
and sort by Type
, finally ungroup and remove the tmpType
column.Notes
tmpType
column is necessary because grouping on Type
will also make Total
its own group which will keep it at the bottom.arrange
will sort Total
after A
, B
, etc. If you somehow have Types that go beyond T, you would need to make another column as the sort key column where the original data has all 0 and the totals data has all 1 (and sort 0 before 1 within each category).paste
call in the line that sorts the names seems to be necessary since as.Date
will return NA
if you only give it a month and year. Appending 01
to each date basically just means parse the date at the beginning of each month, which seems fine for your case.Upvotes: 2