fleems
fleems

Reputation: 109

Counting text values across different columns, in to new columns


EDIT 1: Clarity and typo's, expanded more on example.


I have a dataset with one column (Action) that has text values. I want to count the unique values (for a given Operatie) and place these in a new column, according to their ID (== Operatie). There are 21 unique values within Action.

In the new data set it is important that the new column (which counts a single text value from Action) is linked to the value of Q.Operatie (has values Q1, Q2, Q3, Q4) and Operatie (1:100).

Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend == 2. Whereas the next column would be Q1.Goedaardig == 1. Because we have 2 occurrences of Delegerend and 1 occurrence of Goedaardig for Operation == 1. I ignore Instruerend for this example.

This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend, Q2.Delegerend, Q3.Delegerend, Q4.Delegerend for the row of Operatie == 1 for Delegerend (one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.

An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.

   Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder        Action Focus InterTeam
1         1 08:44:56             00:00                      1                1%        Q1      C        OA    Delegerend     1         b
2         1 08:45:43             00:00                      2                2%        Q1     C*        AM    Goedaardig     1         a
3         1 08:46:45             00:01                      3                4%        Q1     OA       OA*   Instruerend     3          
4         1 08:47:10             00:02                      3                4%        Q1      C       OA*    Delegerend     1         b
5         1 08:48:03             00:03                      4                6%        Q1      C      Team  Onderwijzend     1         b
6         1 08:48:44             00:03                      5                7%        Q1      C      Team Bewustwording     1         b
7         1 08:49:28             00:04                      6                8%        Q1     C*         C   Instruerend     1         b
8         1 08:50:30             00:05                      7                9%        Q1      C        C*  Onderwijzend     1         b
9         1 08:50:47             00:05                      7               10%        Q1      C        AM    Delegerend     1         a
10        1 08:51:47             00:06                      8               11%        Q1      C        OA   Instruerend     1         b

Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.

   Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1         1      72       10       11           2          4            5            5
2         2      30       10       11           2          2            6            12
3         3     102        1        2           1          5            12            ...
4         4     212        2       NA           3         13            13
5         5      37        4       NA           1          2            ...
6         6      57        2       NA           3          9
7         7     120        3       NA           1          9
8         8     146        3       NA           1          6
9         9     143        2        9           3         10
10       10     189        9        2           3         12

So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.

my_list <- list(unique(sort(obs_IND$Action)))

obs_IND %>% 
count(my_list) %>%
group_by(Operatie) %>%
tally()

Used sources:

Upvotes: 0

Views: 180

Answers (1)

otwtm
otwtm

Reputation: 1999

I created some sample data:

operatie <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(operatie, Q, action)

library(dplyr)
library(tidyr)

We can group by operatie, Q and action, and then count the instance with tally().

df_long <- df %>% group_by(operatie, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)

Now we can use the function spread to create wide dataframe with columns for each combination of Q and action:

df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))

Result

  Q     operatie `action 1 Q1` `action 1 Q2` `action 2 Q1` ...
  <fct>    <dbl>         <dbl>         <dbl>         <dbl> ...
1 Q1           1             4             0             0 ...
2 Q1           1             0             0             2 ...
3 Q1           1             0             0             0 ...
4 Q1           1             0             0             0 ...
5 Q2           1             0             2             0 ...
6 Q2           1             0             0             0 ...
...

Upvotes: 1

Related Questions