Reputation: 1097
I have a data frame that lists down some names of individuals and their monetary transactions carried out in USD. The table lists down data according to several districts and the valid transactions made by either cash or credit cards, like so:
X Dist transact.cash transact.card
a 1 USD USD
b 1 USD USD
Where X is an individual and his/her transactions for a period of time keeping that period fixed and Dist
is the district where he/she resides. There are over 4000 observations in total for an approx. 80-100 rows per Dist
. So far, the sorting, slicing and everything else have been simple operations with dat.cash
and dat.card
being subsetted tables according to mode of transaction; but I'm having problems when extracting information in reference to ranking the dataset. For this, I have written a function where I specify a rank and the function should show those rows starting from that rank:
rankdat <- function(transact, numb) {
# Truncated
valid.nums = c('highest', 'lowest', 1:nrow(dat.cash)) # for cash subset
if (transact == 'cash' && numb == 'highest') { # This is easy
sort <- dat.cash[order(dat.cash[, 3], decreasing = T), ]# For sorting only cash data set
} else if (transact == 'cash' and numb == 1:nrow(dat.cash)) {
sort <- dat.cash[order(dat.cash[, 3], decreasing = T) == numb, ] } # Not getting results here
}
The last line is returning NULL
instead of a ranked transaction and all its rows. Replacing ==
with %in%
still gives NULL
and using rank()
doesn't change anything. For highest
and lowest
numbers, its not a great deal since it only involves simple sorting. If I specify rankdat('cash', 10)
, the function should return values starting from the 10th highest transaction and decreasing irrespective of Dist
, similar to:
X Dist transact.cash
b 1 10th highest
h 2 11th highest
p 1 12th highest
and so on
Upvotes: 1
Views: 89
Reputation: 1155
This function is able to do that:
rankdat <- function(df,rank.by,num=10,method="top",decreasing=T){
# ------------------------------------------------------
# RANKDAT
# ------------------------------------------------------
# ARGUMENT
# ========
# df Input dataFrame [d.f]
# num Selected row [num]
# rank.by Name of column(s) used to rank dataFrame
# method Method used to extract rows
# top - to select top rank (e.g. 10 first rows)
# specific - to select specific row
# ------------------------------------------------------
eval(parse(text=paste("sort=df[with(df,order(",rank.by,"), decreasing=",decreasing,"),]",sep=""))) # order dataFrame by
if(method %in% "top"){
return(sort[1:num,])
}else if(method %in% "specific"){
return(sort[num,])
}else{
stop("Please select method used to extract data !!!")
}
}
Upvotes: 1
Reputation: 1155
Suppose that you have the following data.frame:
df=data.frame(X=c(rep('A',2),rep('B',3),rep('A',3),rep('B',2)),
Dist=c(rep(1,5),rep(0,5)),
transact.cash=c(rep('USD',5),rep('€',5)),
transact.card=c(rep('USD',5),rep('€',5)))
We obtain:
X Dist transact.cash transact.card
1 A 1 USD USD
2 A 1 USD USD
3 B 1 USD USD
4 B 1 USD USD
5 B 1 USD USD
6 A 0 € €
7 A 0 € €
8 A 0 € €
9 B 0 € €
10 B 0 € €
If you would like to sort a dataframe with multiple columns transact.cash
or transact.cash
you can used stackoverflow : How to sort a dataframe by column(s). In your example, you only specified dat.cash
, thus :
sort = df[order(df$transact.cash, decreasing=T),] # Order your dataFrame with transact.cash column
If you want to extract rows which respect a specific statement, you need to use which()
and ==
for numeric, double, logical match or %in%
for string match. For example:
XA = df[which(df$X %in% "A"),] # Select row by user
XDist = df[which(df$Dist == 1),] # Select row by District
Finally, if you would like to select the first five row after ordering:
sort[1:5,] # Select first five rows
sort[1:numb,] # Select first numb rows
With that you can perform a simple function to easily extract data from your dataframe.
Hope it will help you
Upvotes: 1