Reputation: 99
I have a very large data frame structured like this:
User A B C D
1 1,0 0,0 0,2 1,1
2 1,1,1 0,0,1 0,0,0 2,2,1
3 1 0 0 1
4 2,1,2 1,1,1 3,2,1 1,1,1
I need to take the highest value out of each vector/cell, and replace that respective cell with the highest value. In other words, the data frame should look like this:
User A B C D
1 1 0 2 1
2 1 1 0 2
3 1 0 0 1
4 2 1 3 1
To create the first data frame, I have used aggregate based on User number. I thought about trying lapply with gsub, but that would require me to decide the specific combinations possible a priori, and that could lead to too many errors given the size of this data set.
Any help is very much appreciated!
Upvotes: 2
Views: 70
Reputation: 886938
Here is an option with tidyverse
library(tidyverse)
f1 <- function(dot) {
str_extract_all(dot, '[0-9]+') %>%
map(as.numeric) %>%
map_dbl(max)
df %>%
mutate_at(vars(-User), funs(f1))
# User A B C D
#1 1 1 0 2 1
#2 2 1 1 0 2
#3 3 1 0 0 1
#4 4 2 1 3 1
If we are using cSplit
from splitstackshape
, this could be made more compact
library(splitstackshape)
cSplit(df, names(df)[-1], ",", "long")[, lapply(.SD, max, na.rm = TRUE), User]
# User A B C D
#1: 1 1 0 2 1
#2: 2 1 1 0 2
#3: 3 1 0 0 1
#4: 4 2 1 3 1
df <- structure(list(User = 1:4, A = c("1,0", "1,1,1", "1", "2,1,2"
), B = c("0,0", "0,0,1", "0", "1,1,1"), C = c("0,2", "0,0,0",
"0", "3,2,1"), D = c("1,1", "2,2,1", "1", "1,1,1")), .Names = c("User",
"A", "B", "C", "D"), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 0
Reputation: 37641
Just split each entry and take max.
## create data
df = read.table(text="User A B C D
1 '1,0' '0,0' '0,2' '1,1'
2 '1,1,1' '0,0,1' '0,0,0' '2,2,1'
3 '1' '0' '0' '1'
4 '2,1,2' '1,1,1' '3,2,1' '1,1,1'",
header=TRUE, stringsAsFactors=FALSE)
for(i in 2:5) {
df[,i] = sapply(strsplit(df[,i], ","), function(x) max(as.numeric(x))) }
df
User A B C D
1 1 1 0 2 1
2 2 1 1 0 2
3 3 1 0 0 1
4 4 2 1 3 1
Upvotes: 3