shiv_90
shiv_90

Reputation: 1097

R - how to index rank and accordingly display a data frame?

I have a data frame that lists down some names of individuals and their monetary transactions carried out in USD. The table lists down data according to several districts and the valid transactions made by either cash or credit cards, like so:

X    Dist    transact.cash    transact.card
a    1       USD              USD
b    1       USD              USD

Where X is an individual and his/her transactions for a period of time keeping that period fixed and Dist is the district where he/she resides. There are over 4000 observations in total for an approx. 80-100 rows per Dist. So far, the sorting, slicing and everything else have been simple operations with dat.cash and dat.card being subsetted tables according to mode of transaction; but I'm having problems when extracting information in reference to ranking the dataset. For this, I have written a function where I specify a rank and the function should show those rows starting from that rank:

rankdat <- function(transact, numb) {
               # Truncated
                 valid.nums = c('highest', 'lowest', 1:nrow(dat.cash)) # for cash subset
                     if (transact == 'cash' && numb == 'highest') { # This is easy
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T), ]# For sorting only cash data set
                  } else if (transact == 'cash' and numb == 1:nrow(dat.cash)) { 
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T) == numb, ] } # Not getting results here
                 }

The last line is returning NULL instead of a ranked transaction and all its rows. Replacing == with %in% still gives NULL and using rank() doesn't change anything. For highest and lowest numbers, its not a great deal since it only involves simple sorting. If I specify rankdat('cash', 10), the function should return values starting from the 10th highest transaction and decreasing irrespective of Dist, similar to:

 X    Dist    transact.cash
 b    1       10th highest
 h    2       11th highest
 p    1       12th highest
 and  so      on

Upvotes: 1

Views: 89

Answers (2)

B.Gees
B.Gees

Reputation: 1155

This function is able to do that:

rankdat <- function(df,rank.by,num=10,method="top",decreasing=T){
  # ------------------------------------------------------
  # RANKDAT
  # ------------------------------------------------------
  # ARGUMENT 
  # ========
  # df        Input dataFrame [d.f]
  # num       Selected row [num]
  # rank.by   Name of column(s) used to rank dataFrame
  # method    Method used to extract rows
  #             top - to select top rank (e.g. 10 first rows)
  #             specific - to select specific row
  # ------------------------------------------------------
  eval(parse(text=paste("sort=df[with(df,order(",rank.by,"), decreasing=",decreasing,"),]",sep=""))) # order dataFrame by 
  if(method %in% "top"){
    return(sort[1:num,])
  }else if(method %in% "specific"){
    return(sort[num,])
  }else{
    stop("Please select method used to extract data !!!")
  }
}

Upvotes: 1

B.Gees
B.Gees

Reputation: 1155

Suppose that you have the following data.frame:

df=data.frame(X=c(rep('A',2),rep('B',3),rep('A',3),rep('B',2)),
               Dist=c(rep(1,5),rep(0,5)),
               transact.cash=c(rep('USD',5),rep('€',5)),
               transact.card=c(rep('USD',5),rep('€',5)))

We obtain:

   X Dist transact.cash transact.card
1  A    1           USD           USD
2  A    1           USD           USD
3  B    1           USD           USD
4  B    1           USD           USD
5  B    1           USD           USD
6  A    0             €             €
7  A    0             €             €
8  A    0             €             €
9  B    0             €             €
10 B    0             €             €

If you would like to sort a dataframe with multiple columns transact.cash or transact.cash you can used stackoverflow : How to sort a dataframe by column(s). In your example, you only specified dat.cash, thus :

sort = df[order(df$transact.cash, decreasing=T),] # Order your dataFrame with transact.cash column 

If you want to extract rows which respect a specific statement, you need to use which() and == for numeric, double, logical match or %in% for string match. For example:

XA = df[which(df$X %in% "A"),] # Select row by user
XDist = df[which(df$Dist == 1),] # Select row by District

Finally, if you would like to select the first five row after ordering:

sort[1:5,] # Select first five rows
sort[1:numb,] # Select first numb rows

With that you can perform a simple function to easily extract data from your dataframe.

Hope it will help you

Upvotes: 1

Related Questions