Haley
Haley

Reputation: 43

How can I group by one variable in terms of status of a different variable in a longitudinal situation in R?

I'm new to R, so please go easy on me... I have some longitudinal data that looks like this

Basically, I'm trying to find a way to get a table with a) the number of unique cases that have all complete data and b) the number of unique cases that have at least one incomplete or missing data. The end results would ideally be

this

df<- df %>% group_by(Location)
df1<- df %>% group_by(any(Completion_status=='Incomplete' | 'Missing'))

Upvotes: 3

Views: 55

Answers (1)

s__
s__

Reputation: 9495

Not sure about what you want, because it seems there are something of inconsistent between your request and the desired output, however lets try, it seems you need a kind of frequency table, that you can manage with basic R. At the bottom of the answer you can find some data similar to yours.

# You have two cases, the Complete, and the other, so here a new column about it:
data$case <- ifelse(data$Completion_status =='Complete','Complete', 'MorIn')

# now a frequency table about them: if you want a data.frame, here we go
result <- as.data.frame.matrix(table(data$Location,data$case))

# now the location as a new column rather than the rownames
result$Location <- rownames(result)

# and lastly a data.frame with the final results: note that you can change the names
# of the columns but if you want spaces maybe a tibble is better 
result <- data.frame(Location = result$Location,
                     `Number.complete` = result$Complete,
                     `Number.incomplete.missing` = result$MorIn)

result
     Location Number.complete Number.incomplete.missing
1      London               0                         1
2 Los Angeles               0                         1
3       Paris               3                         1
4     Phoenix               0                         2
5     Toronto               1                         1

Or if you prefere a dplyr chain:

data %>%
mutate(case = ifelse(data$Completion_status =='Complete','Complete', 'MorIn')) %>%
do( as.data.frame.matrix(table(.$Location,.$case))) %>%
mutate(Location = rownames(.)) %>%
select(3,1,2) %>%
`colnames<-`(c("Location","Number of complete ", "Number of incomplete or"))
     Location Number of complete  Number of incomplete or
1      London                   0                       1
2 Los Angeles                   0                       1
3       Paris                   3                       1
4     Phoenix                   0                       2
5     Toronto                   1                       1

With data:

# here your data (next time try to put them in an usable way in the question)
    data <- data.frame( ID = c("A1","A1","A2","A2","B1","C1","C2","D1","D2","E1"),
                        Location = c('Paris','Paris','Paris','Paris','London','Toronto','Toronto','Phoenix','Phoenix','Los Angeles'),
                        Completion_status = c('Complete','Complete','Incomplete','Complete','Incomplete','Missing',
                                 'Complete','Incomplete','Incomplete','Missing'))

Upvotes: 2

Related Questions