HAL-9000
HAL-9000

Reputation: 99

R - Extract highest value in a vector and replace using that value

I have a very large data frame structured like this:

User          A          B          C          D
1             1,0        0,0        0,2        1,1
2             1,1,1      0,0,1      0,0,0      2,2,1
3             1          0          0          1
4             2,1,2      1,1,1      3,2,1      1,1,1

I need to take the highest value out of each vector/cell, and replace that respective cell with the highest value. In other words, the data frame should look like this:

User          A          B          C          D
1             1          0          2          1
2             1          1          0          2
3             1          0          0          1
4             2          1          3          1

To create the first data frame, I have used aggregate based on User number. I thought about trying lapply with gsub, but that would require me to decide the specific combinations possible a priori, and that could lead to too many errors given the size of this data set.

Any help is very much appreciated!

Upvotes: 2

Views: 70

Answers (2)

akrun
akrun

Reputation: 886938

Here is an option with tidyverse

library(tidyverse)
f1 <- function(dot) {
        str_extract_all(dot, '[0-9]+') %>%
              map(as.numeric) %>%
              map_dbl(max)

df %>%
       mutate_at(vars(-User), funs(f1))
#  User A B C D
#1    1 1 0 2 1
#2    2 1 1 0 2
#3    3 1 0 0 1
#4    4 2 1 3 1

If we are using cSplit from splitstackshape, this could be made more compact

library(splitstackshape)
cSplit(df, names(df)[-1], ",", "long")[, lapply(.SD, max, na.rm = TRUE), User]
#   User A B C D
#1:    1 1 0 2 1
#2:    2 1 1 0 2
#3:    3 1 0 0 1
#4:    4 2 1 3 1

data

df <- structure(list(User = 1:4, A = c("1,0", "1,1,1", "1", "2,1,2"
), B = c("0,0", "0,0,1", "0", "1,1,1"), C = c("0,2", "0,0,0", 
"0", "3,2,1"), D = c("1,1", "2,2,1", "1", "1,1,1")), .Names = c("User", 
"A", "B", "C", "D"), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 0

G5W
G5W

Reputation: 37641

Just split each entry and take max.

## create data 
df = read.table(text="User          A          B          C          D
1             '1,0'        '0,0'        '0,2'        '1,1'
2             '1,1,1'      '0,0,1'      '0,0,0'      '2,2,1'
3             '1'          '0'          '0'          '1'
4             '2,1,2'      '1,1,1'      '3,2,1'      '1,1,1'",
header=TRUE, stringsAsFactors=FALSE)

for(i in 2:5) {
    df[,i] = sapply(strsplit(df[,i], ","), function(x) max(as.numeric(x))) }
df
  User A B C D
1    1 1 0 2 1
2    2 1 1 0 2
3    3 1 0 0 1
4    4 2 1 3 1

Upvotes: 3

Related Questions