Mark Olson
Mark Olson

Reputation: 55

reducing number of rows in R

I am new in R programming language. I have a data set that has 2 columns(ID and Num) like this:

ID    Num
3       8
3      12
4      15
4      18
4      24

But I want to convert it to:

ID    Num
3     8 12
4     15 18 24

3 and 4 are still in column 'ID' but 8 and 12 are in one row near each other, in 'Num' column with 'ID' of 3. And also 4 is in column 'ID' and 15 18 and 24 are in one row near each other, in 'Num' column with ID of 4. Can anyone help me convert original data set to this new type. I searched a lot but I couldn't find R code of this problem anywhere.

Upvotes: 3

Views: 1203

Answers (3)

Jilber Urbina
Jilber Urbina

Reputation: 61164

You can also use aggregate

> aggregate(DF$Num~DF$ID, FUN=paste, sep=" ")
  DF$ID     DF$Num
1     3      8, 12
2     4 15, 18, 24

Alternatively, you can use data = parameter of aggregate to get the column names not have the DF$:

aggregate(data=DF, Num~ID, FUN=paste, sep=" ")
#   ID        Num
# 1  3      8, 12
# 2  4 15, 18, 24

Upvotes: 3

Arun
Arun

Reputation: 118799

Alternatively if you want the Num column to be a list, you can do something this:

using by:

do.call(rbind, by(df, df$ID, FUN=function(x) 
             data.frame(ID=x$ID[1], Num = I(list(x$Num)))))

#   ID        Num
# 3  3      8, 12
# 4  4 15, 18, 24

Or using split + lapply:

do.call(rbind, lapply(split(df, df$ID), function(x) 
               data.frame(ID=x$ID[1], Num=I(list(x$Num)))))

Or using plyr package:

require(plyr)
ddply(df, .(ID), function(x) data.frame(ID = x$ID[1], Num = I(list(x$Num))))

Or using data.table package:

require(data.table)
dt <- as.data.table(df)
dt[, list(Num = list(Num)),by = ID]

Upvotes: 1

Thilo
Thilo

Reputation: 9157

The problem with the data format you want to get is that it requires a varying number of columns. Of course, if you have at most three values for each id, you could just add three columns. But that will get rather complicated and hard to handle for ids with say 100 values.

On way around is to use lists. Here, The number of columns is not fixed anymore.

The way to archive what you want with lists is not difficult:

d <- data.frame(id=c(3,3,4,4,4), num=c(8,12,15,18,24))  # Just your sample data
l <- with(d, tapply(num, id, c))

What happens above? with just spares me the need for typing d$num and d$id and does nothing for the actual solution. The key lies in tapply. Here, we group all values of num by id and call c separately for those groups. tapply then collects the outputs and returns a datastructure that fits the resulting values best - in our case that is a list. The result:

> l
$`3`
[1]  8 12

$`4`
[1] 15 18 24

You can query only parts using

> l[[1]]      # The first element in the list
[1]  8 12
> l[['3']]    # The element with key (id) `3`
[1]  8 12

One more way. If you rather want the numbers pasted together as a string in a single column this of course is possible as well:

> with(d, tapply(num, id, paste, collapse=' '))
         3          4 
    "8 12" "15 18 24" 

Upvotes: 1

Related Questions