Reputation: 11

R - Transposing portions of a data frame

I have a .dbf file I exported from ArcGIS 10.1 and I need to reorganize the it. An example of the data is:

       V1               V2
40.000000000000000 41.000000000000000

40.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000

41.000000000000000 42.000000000000000

41.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000

42.000000000000000 41.000000000000000

42.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000

43.000000000000000 42.000000000000000

I need the data in a format where there is only one row for each unique value in the first column, with all of the corresponding values from the second column now appearing in that row, for example:

  V1                   V2              V3                  V4
40.000000000000000 41.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000 42.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000 41.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000 42.000000000000000

If anyone can help me with this problem I would appreciate it. Thanks!

Upvotes: 1

Answers (3)

akrun

Reputation: 886998

You could also do this in dplyr

library(dplyr)
library(tidyr)
  dat%>% 
  group_by(X1) %>%
  mutate(Time=seq_along(X1)) 
  %>%spread(Time,X2)
 #Source: local data frame [4 x 4]

 #X1  1  2  3
#1 40 41 42 NA
#2 41 40 42 43
#3 42 40 41 43
#4 43 41 42 NA

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

This is essentially a reshape problem, but you don't have a "time" variable.

You can easily create a "time" variable as follows:

dat$time <- with(dat, ave(X1, X1, FUN = seq_along))

From there, use reshape from base R...

reshape(dat, direction = "wide", idvar="X1", timevar="time")
#   X1 X2.1 X2.2 X2.3
# 1 40   41   42   NA
# 3 41   40   42   43
# 6 42   40   41   43
# 9 43   41   42   NA

... or dcast from "reshape2"...

library(reshape2)
dcast(dat, X1 ~ time, value.var="X2")
#   X1  1  2  3
# 1 40 41 42 NA
# 2 41 40 42 43
# 3 42 40 41 43
# 4 43 41 42 NA

Upvotes: 1

josliber

Reputation: 44309

You can split up your data frame using the split function on the first column and the use lapply to extract your vectors:

dat = data.frame(X1=c(40, 40, 41, 41, 41, 42, 42, 42, 43, 43),
                 X2=c(41, 42, 40, 42, 43, 40, 41, 43, 41, 42))
res <- lapply(split(dat, dat[,1]), function(d) c(d[1,1], sort(unique(d[,2]))))
res
# $`40`
# [1] 40 41 42
# 
# $`41`
# [1] 41 40 42 43
# 
# $`42`
# [1] 42 40 41 43
# 
# $`43`
# [1] 43 41 42

Most would probably prefer to keep the data in this format, but you can also combine the list into a matrix, right-padding the vectors with NA values:

max.len <- max(unlist(lapply(res, length)))
do.call(rbind, lapply(res, function(x) { length(x) <- max.len ; x }))
#    [,1] [,2] [,3] [,4]
# 40   40   41   42   NA
# 41   41   40   42   43
# 42   42   40   41   43
# 43   43   41   42   NA

Upvotes: 2

R - Transposing portions of a data frame

Answers (3)

Related Questions