Reputation: 347
I have a dfrm of over 100 columns and 150 rows. I need to merge the contents of every 4 columns to 1 (preferably separated by a "/", although dispensable) which is simple enough, performing apply(dfrm[ ,1:4], 1, paste, collapse="/")
. I have difficulties scaling that solution to my whole df. In other words:
How can I go from this:
loc1 loc1.1 loc1.2 loc1.3 loc2 loc2.1 loc2.2 loc2.3
ind.1 257 262 228 266 204 245 282 132
ind.2 244 115 240 187 196 133 189 251
ind.3 298 139 216 225 219 276 192 254
ind.4 129 176 180 182 215 250 227 186
ind.5 238 217 284 240 131 184 247 168
To something like this:
loc1 loc2
ind.1 257/262/228/266 204/245/282/132
ind.2 244/115/240/187 196/133/189/251
ind.3 298/139/216/225 219/276/192/254
ind.4 129/176/180/182 215/250/227/186
ind.5 238/217/284/240 131/184/247/168
In a dataframe of over 100 rows and columns. I've tried indexing the data frame as presented in the solution of this question, but after creating said index of every 4 columns y do find myself lost while trying to perform do.call
over my data frame. I'm sure there must be a easy solution for this, but please keep in mind that i'm all but proficient in R.
Also; the colnames are not a real problem if the body is in shape, since extracting a list of names is performed by loc <- colnames(dfrm)
and loc <- loc[c(T, F, F, F)
, and then defining colnames(dfrm) <- loc
, although would be nice if incorporated.
Upvotes: 5
Views: 330
Reputation: 52637
Way late to the party, but I think this is a little cleaner (and robust to non multiple of 4 column counts):
as.data.frame(
lapply(
split.default(df, (1:ncol(df) - 1) %/% 4),
function(x) do.call(paste, c(x, list(sep="/"))
) ) )
Splitting the data frame by columns using (1:ncol(df) - 1) %/% 4)
creates groups of four columns (or fewer if you have a non-mulitple of four for the last group), which then makes it trivial to pass on to paste
. Note we have to use split.default
because split.data.frame
will attempt to split by row instead of column. Produces:
X0 X1
1 257/262/228/266 204/245/282/132
2 244/115/240/187 196/133/189/251
3 298/139/216/225 219/276/192/254
4 129/176/180/182 215/250/227/186
5 238/217/284/240 131/184/247/168
Upvotes: 3
Reputation: 93813
This is (hopefully) a more generalisable solution that doesn't rely on any positional arguments:
newnames <- gsub("\\.\\d+","",names(df))
#[1] "loc1" "loc1" "loc1" "loc1" "loc2" "loc2" "loc2" "loc2"
do.call(cbind,
lapply(unique(newnames), function(x)
do.call(paste,c(df[newnames %in% x],sep="/") )
)
)
# [,1] [,2]
#[1,] "257/262/228/266" "204/245/282/132"
#[2,] "244/115/240/187" "196/133/189/251"
#[3,] "298/139/216/225" "219/276/192/254"
#[4,] "129/176/180/182" "215/250/227/186"
#[5,] "238/217/284/240" "131/184/247/168"
Upvotes: 0
Reputation: 685
May be it is faster.
df = data.frame(c1 =letters,c2=LETTERS, c3=letters, c4=LETTERS)
do.call('paste',c(df[,1:2],list(sep='/')));
[1] "A/a" "B/b" "C/c" "D/d" "E/e" "F/f" "G/g" "H/h" "I/i" "J/j" "K/k" "L/l"
[13] "M/m" "N/n" "O/o" "P/p" "Q/q" "R/r" "S/s" "T/t" "U/u" "V/v" "W/w" "X/x"
[25] "Y/y" "Z/z"
do.call('paste',c(df[,3:4],list(sep='/')));
[1] "A/a" "B/b" "C/c" "D/d" "E/e" "F/f" "G/g" "H/h" "I/i" "J/j" "K/k" "L/l"
[13] "M/m" "N/n" "O/o" "P/p" "Q/q" "R/r" "S/s" "T/t" "U/u" "V/v" "W/w" "X/x"
[25] "Y/y" "Z/z"
Upvotes: 0
Reputation: 49448
This is certainly not pretty, but it works:
do.call(cbind, lapply(1:ceiling(ncol(df)/4), function(i)
apply(df[,seq(4*(i-1)+1, min(4*i, ncol(df))), drop = F],
1, paste, collapse = "/")))
# [,1] [,2]
#ind.1 "257/262/228/266" "204/245/282/132"
#ind.2 "244/115/240/187" "196/133/189/251"
#ind.3 "298/139/216/225" "219/276/192/254"
#ind.4 "129/176/180/182" "215/250/227/186"
#ind.5 "238/217/284/240" "131/184/247/168"
The ceiling
and drop
are there to survive edge cases when number of columns is not divisible by 4. Also, note that the end result is a matrix
here (thanks to the apply
), and you can convert it back to data.frame
if you like (and assign whatever column names).
Upvotes: 5