Reputation: 109
Due to some coding, the dataset is overly ordered. I would like to keep the columns, but would like the sum up every Action (= 21 unique terms). In order that it fits in a single row that counts the occurrences in each quartile (Q) of the procedure.
The goal is to tally predictors per instance of Q. The data is sorted in 21 Actions, and 4 Qs and 100 procedures. Overall tally is present, we want to cut in 4 pieces (time).
I also don't mind tallying all the Actions with removing Q as a column.
I want to have this, or this is what I expect from the result:
procedure`action 1 Q1` `action 1 Q2` `action 2 Q1` `action 2 Q2`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 2 2 3
2 2 2 2 1 5
3 3 .. .. .. ..
But my dataframe looks like this:
Q procedure `action 1 Q1` `action 1 Q2` `action 2 Q1` `action 2 Q2`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Q1 1 4 0 0 0
2 Q1 1 0 0 2 0
3 Q1 1 0 0 0 0
4 Q1 1 0 0 0 0
5 Q2 1 0 2 0 0
6 Q2 1 0 0 0 3
7 Q2 1 0 0 0 0
8 Q2 1 0 0 0 0
9 Q1 2 2 0 0 0
10 Q1 2 0 0 1 0
11 Q1 2 0 0 0 0
12 Q1 2 0 0 0 0
13 Q2 2 0 2 0 0
14 Q2 2 0 0 0 5
15 Q2 2 0 0 0 0
16 Q2 2 0 0 0 0
# ... with 4 more variables: `action 3 Q1` <dbl>, `action 3 Q2` <dbl>, `action 4
# Q1` <dbl>, `action 4 Q2` <dbl>
I tried this, but got stuck when using spread() and gave the second dataframe example as output.
procedure <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(procedure, Q, action)
library(dplyr)
library(tidyr)
# We can group by procedure, Q and action, and then count the instance with tally().
df_long <- df %>% group_by(procedure, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)
# Now we can use the function spread to create wide dataframe with columns for each combination of Q and action:
df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))
df_long
looks like this (after paste(action.Q)
):
# A tibble: 10 x 5
# Groups: procedure, Q [3]
procedure Q action n action.Q
<dbl> <fct> <fct> <int> <chr>
1 1 Q1 action 1 4 action 1 Q1
2 1 Q1 action 2 2 action 2 Q1
3 1 Q1 action 3 2 action 3 Q1
4 1 Q1 action 4 2 action 4 Q1
5 1 Q2 action 1 2 action 1 Q2
6 1 Q2 action 2 4 action 2 Q2
7 1 Q2 action 3 2 action 3 Q2
8 1 Q2 action 4 2 action 4 Q2
9 2 Q1 action 1 2 action 1 Q1
10 2 Q1 action 2 2 action 2 Q1
Source: Counting text values across different columns, in to new columns
Upvotes: 0
Views: 81
Reputation: 1253
I'm sure there are better ways, but starting from where you left of:
df_wide <- df_long %>% spread(action.Q, n, fill=0)
df_wide %>%
group_by(procedure) %>%
summarize(`action 1 Q1` = sum(`action 1 Q1`), `action 1 Q2` = sum(`action 1 Q2`), `action 2 Q1` = sum(`action 2 Q1`), `action 2 Q2` = sum(`action 2 Q2`))
Note that I changed operatie
into procedure
in your sample data set.
EDIT: Thanks to Ronak Shah, you can do the summarize
in a less manual manner:
df_wide %>%
group_by(procedure) %>%
summarize_at(vars(starts_with("action ")), sum)
Note the 'space' after action, to avoid matching the action
column itself.
Upvotes: 1