Reputation: 218

Merging rows of dataframe based on unique ID

I have a data frame where a unique subject ID is repeated twice for all participants. The data following seems to consist of one column for which the value is NA one of the entries, and one column where there is a value for one of the entries (though this is not certain and the method I use should account for the possibility of this not being true). Here is an example:

Name <- c("Jon", "Jon", "Maria", "Maria", "Tina", "Tina", "dan", 'dan', 'wen', 'wen')
a <- c(1, 1, 2, 2, 3, 4, 4, 4, 5, 6)
b <- c(NA, 1, NA, 2, NA, 3, NA, 4, NA, 5)
c <- c(1, NA, 2, NA, 3, NA, 4, NA, 5, NA)
df <- data.frame(Name, a, b, c)

The solution I thought of so far consists of looping through all the unique IDs (in the above example, Names) and making separate dataframes for each of the entries. Something like this:

#Instantiate list of lists that will become dfs
firstdf <- c()
seconddf <- c()

#Loop through existing df by unique ID (Name) and create 
# list containing values of 1 entry and list of the other 
for (i in unique(df$Name)) {
  innerlist1 <- c()
  innerlist2 <- c()
  
  for (x in c(1:length(df[df['Name'] == i]))) {
    if (x%%2 == 1) {
      # Takes one set of entries per ID
      innerlist1 <- c(innerlist1, df[df['Name'] == i][x])
      
    } else if (x%%2 == 0) {
      # Takes other set of entries per ID
      innerlist2 <- c(innerlist2, df[df['Name'] == i][x])
    }
  }
  firstdf <- c(firstdf, list(innerlist1))
  seconddf <- c(seconddf, list(innerlist2))
}
# Make dfs from lists
firstdf <- do.call(rbind.data.frame, firstdf)
names(firstdf) <- names(df)

seconddf <- do.call(rbind.data.frame, seconddf)
names(seconddf) <- names(df)

I would then proceed to combine the dfs by using something like merge, with by="Name". My original dataset is large and this is not particularly efficient or elegant. Can anyone suggest improvements?

Upvotes: 3

Answers (3)

JED HK

Reputation: 218

BTW, for future readers, what I eventually ended up doing was taking each entry per ID by odd/even index and making two dataframes like so:

firstdf <- df[seq_len(nrow(df))%%2 == 1, ]
seconddf <- df[seq_len(nrow(df))%%2 == 0, ]

After this point it's just a question of dropping columns where all entries are NAs and then merging the dfs, whilst dealing with situations where both dfs have non-NA values in the same location accordingly (e.g., by taking the mean of the two values).

Some extra steps I also had to take in my real-life situation that the simplicity of this example does not capture include:

Ordering the df and resetting the index so that the locations of entries in the df remained consistent in both dfs, like so:

df <- df[order(df$Name), ]
rownames(df) <- NULL

Checking that each name appeared exactly twice, not more and not less:

#Using dplyr 
library(dplyr)
df %>% 
     count(Name) %>%
     filter(n!=2)
# Should return 0 rows

In the case that there were more than or less than two entries, I did the following:

more <- df %>% 
        count(name) %>%
        filter(n>2)

df_more_than_two <- df[df$Name %in% more$Name]

# Change sign in filter function to < 2 for those with only one entry

I then made three dataframes (those with 1 entry, those with 2 entries, and those with 3 entries) but essentially performed the same steps

Upvotes: -1

Quinten

Reputation: 41225

If there are multiple non-NA values per ID, you can concentrate them toString. You can use the following code:

library(dplyr)
df %>% 
  group_by(Name) %>% 
  summarise_all(funs(toString(na.omit(.))))

Output:

# A tibble: 5 × 4
  Name  a     b     c    
  <chr> <chr> <chr> <chr>
1 dan   4, 4  4     4    
2 Jon   1, 1  1     1    
3 Maria 2, 2  2     2    
4 Tina  3, 4  3     3    
5 wen   5, 6  5     5

Upvotes: 1

Clemsang

Reputation: 5481

You can keep first value by group exluding NA:

library(dplyr)

df %>%
  group_by(Name) %>%
  summarise(a = first(stats::na.omit(a)), 
            b = first(stats::na.omit(b)), 
            c = first(stats::na.omit(c)))
# A tibble: 5 x 4
  Name      a     b     c
  <chr> <dbl> <dbl> <dbl>
1 dan       4     4     4
2 Jon       1     1     1
3 Maria     2     2     2
4 Tina      3     3     3
5 wen       5     5     5

Upvotes: 1

Merging rows of dataframe based on unique ID

Answers (3)

Related Questions