Reputation: 109
EDIT 1: Clarity and typo's, expanded more on example.
I have a dataset with one column (Action
) that has text values. I want to count the unique values (for a given Operatie
) and place these in a new column, according to their ID (== Operatie
). There are 21 unique values within Action
.
In the new data set it is important that the new column (which counts a single text value from Action
) is linked to the value of Q.Operatie
(has values Q1, Q2, Q3, Q4
) and Operatie
(1:100
).
Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend
== 2. Whereas the next column would be Q1.Goedaardig
== 1. Because we have 2 occurrences of Delegerend
and 1 occurrence of Goedaardig
for Operation == 1
. I ignore Instruerend
for this example.
This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend
, Q2.Delegerend
, Q3.Delegerend
, Q4.Delegerend
for the row of Operatie == 1
for Delegerend
(one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.
An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.
Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder Action Focus InterTeam
1 1 08:44:56 00:00 1 1% Q1 C OA Delegerend 1 b
2 1 08:45:43 00:00 2 2% Q1 C* AM Goedaardig 1 a
3 1 08:46:45 00:01 3 4% Q1 OA OA* Instruerend 3
4 1 08:47:10 00:02 3 4% Q1 C OA* Delegerend 1 b
5 1 08:48:03 00:03 4 6% Q1 C Team Onderwijzend 1 b
6 1 08:48:44 00:03 5 7% Q1 C Team Bewustwording 1 b
7 1 08:49:28 00:04 6 8% Q1 C* C Instruerend 1 b
8 1 08:50:30 00:05 7 9% Q1 C C* Onderwijzend 1 b
9 1 08:50:47 00:05 7 10% Q1 C AM Delegerend 1 a
10 1 08:51:47 00:06 8 11% Q1 C OA Instruerend 1 b
Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie
. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.
Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1 1 72 10 11 2 4 5 5
2 2 30 10 11 2 2 6 12
3 3 102 1 2 1 5 12 ...
4 4 212 2 NA 3 13 13
5 5 37 4 NA 1 2 ...
6 6 57 2 NA 3 9
7 7 120 3 NA 1 9
8 8 146 3 NA 1 6
9 9 143 2 9 3 10
10 10 189 9 2 3 12
So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr
. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.
my_list <- list(unique(sort(obs_IND$Action)))
obs_IND %>%
count(my_list) %>%
group_by(Operatie) %>%
tally()
Used sources:
Upvotes: 0
Views: 180
Reputation: 1999
I created some sample data:
operatie <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(operatie, Q, action)
library(dplyr)
library(tidyr)
We can group by operatie
, Q
and action
, and then count the instance with tally().
df_long <- df %>% group_by(operatie, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)
Now we can use the function spread
to create wide dataframe with columns for each combination of Q
and action
:
df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))
Result
Q operatie `action 1 Q1` `action 1 Q2` `action 2 Q1` ...
<fct> <dbl> <dbl> <dbl> <dbl> ...
1 Q1 1 4 0 0 ...
2 Q1 1 0 0 2 ...
3 Q1 1 0 0 0 ...
4 Q1 1 0 0 0 ...
5 Q2 1 0 2 0 ...
6 Q2 1 0 0 0 ...
...
Upvotes: 1