Aviya Weiser
Aviya Weiser

Reputation: 13

Changing the structure of rows and columns in a data frame- R

I have a large database (90,000 * 1500) sorted by child observations - which includes their mom's info. I want to sort the database according to mom's data.

The problem is that each kid only appears once in DB mom bs. It may appear up to 10 times.

In addition, I want the number of rows to be a number of different mothers (approx. 40,000) and a bit of data for each child - between 0-10.

For example, the DB I have and the DB I want to create:

enter image description here

Upvotes: 0

Views: 539

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

Here is a solution similar to @Metariat, but with base R, where ave() is used

df$seq <- with(df,ave(id_num,mom,FUN = seq_along))
dfout <- reshape(df, idvar = "mom", timevar = "seq", direction = "wide")

such that

> dfout
  mom c.1 id_num.1 age.1  c.2 id_num.2 age.2
1   1  c1        1    12   c3        3     5
2   3  c2        2    15   c5        5    19
4   2  c4        4     8 <NA>       NA    NA

EDIT: If you have very big data frame, you can try the divide and conquer policy to see if it works

library(plyr)
dfs <- split(df,df$mom)
lst <- lapply(dfs, function(x) {
  x <- within(x,seqnum <- ave(id_num,mom,FUN = seq_along))
  reshape(x, idvar = "mom", timevar = "seqnum", direction = "wide")
}
)
dfout <- rbind.fill(lst)

Upvotes: 0

Metariat
Metariat

Reputation: 532

You could use reshape

 library(data.table)
 df = data.frame(
   'c' = c('c1', 'c2', 'c3', 'c4', 'c5'),
   'id_num' = seq(1,5),
   'age' = c(12, 15, 5, 8, 19),
   'mom'= c(1,3,1,2,3)
 )
 df
   c id_num age mom
1 c1      1  12   1
2 c2      2  15   3
3 c3      3   5   1
4 c4      4   8   2
5 c5      5  19   3
 df = setDT(df)[order(mom)]
 df[, id_child := seq(.N), mom]

 reshape(df, idvar = "mom", timevar = "id_child", direction = "wide")
   mom c.1 id_num.1 age.1  c.2 id_num.2 age.2
1:   1  c1        1    12   c3        3     5
2:   2  c4        4     8 <NA>       NA    NA
3:   3  c2        2    15   c5        5    19

Upvotes: 2

HarmenD
HarmenD

Reputation: 56

You can do this using the tidyr package, with group_by.

group_by(data, mom)

Then each mom contains a list of children. You can then sort the database as follows.

arrange(data, id_num, .by_group = TRUE)

To filter children between 0 and 10:

filter(data, age <= 10)

Upvotes: 0

Related Questions