Reputation: 81
splitted is a list of data frames coming from a split() on the main data frame.
After splitting, I'm applying a function to every data frame in the splitted list.
Here the function:
getCustomer <- function(df, numberOfProducts = 3){
Gender <- unique(df$gender)
Segment <- unique(df$Segment)
Net_Discount <- sum(df$Discount * df$Sales)
Number_of_Discounts <- sum(df$Discount>0)
Customer.ID <- unique(df$Customer.ID)
Sales <- sum(df$Sales)
Profit <- sum(df$Profit)
lat <- mean(df$lat)
lon <- mean(df$lon)
productsData <- df %>% arrange(Order.Date) %>% top_n(n =numberOfProducts)
Products <- 0
Products_Category <- 0
Products_Order_Date <- 0
for (j in 1:numberOfProducts){
Products[j] <- productsData %>% select(Product.ID) %>% filter(row_number()==j)
Products_Category[j] <- productsData %>% select(Category) %>% filter(row_number()==j)
Products_Order_Date[j] <- productsData %>% select(Order.Date) %>% filte(row_number()==j)
names(Products)[j]<-paste("Product",j)
names(Products_Category)[j]<-paste("Category Product",j)
names(Products_Order_Date)[j]<-paste("Order Date Product",j)
}
output <- data.frame(Customer.ID, Gender,Segment, Net_Discount, Number_of_Discounts, Sales, Profit,
Products, Products_Category, Products_Order_Date, lon,lat)
return(output[1,])
}
I get the right answer for any element of splitted
getCustomer(splitted[[687]],2)
I can even do well with
customer <- list()
customer[[1]]<- getCustomer(splitted[[1]],2)
customer[[2]]<- getCustomer(splitted[[2]],2)
.
.
.
customer[[1576]]<- getCustomer(splitted[[1576]],2)
That is, I can effectively build the whole customer list by assigning element by element.
However, I certainly don't have time for that (1576 single line data frames to assign to the customer list), so I'm trying:
customer <- list()
for (i in 1:length(splitted)){
customer[[i]]<-getCustomer(splitted[[i]],2)
}
After running this last chunk of code, I get:
Error in data.frame(Customer.ID, Gender, Segment, Net_Discount, Number_of_Discounts, : arguments imply differing number of rows: 0, 1
I can't understand this error, since I can build the customer list element by element at a time.
Would apreciate your help.
Solution
Editing this question to let you know the problem was indeed that some data frames in splitted had no rows. So I removed them (only 3).
for (i in 1:length(splitted)){
l[i]<-nrow(splitted[[i]])
}
indices<- which(l==0)
splitted<-splitted[-indices]
Just had to delete 3 samples. Got no error this time. Thank you all for your time.
Upvotes: 1
Views: 228
Reputation: 81
The problem was indeed that some data frames in splitted had no rows. So I removed them (only 3).
for (i in 1:length(splitted)){
l[i]<-nrow(splitted[[i]])
}
indices<- which(l==0)
splitted<-splitted[-indices]
Just had to delete 3 samples.
Got no error this time. Thank you all for your time.
Upvotes: 1
Reputation: 4551
My usual strategy for troubleshooting something like this is to start running it in chunks. If you use the for loop, check what value of i
is when the error occurs. With lapply
, I will run in chunks of around 20... and keep going until you find which data frame in your list is causing the error.
Then, run through your function manually with that data frame and look at what output you get. For example:
df <- splitted[[30]] # assuming #30 is the problem
numberOfProducts <- 3
now walk through the function arguments and check that output until you find what causes the error. Keep in mind that if there are multiple places where problems can occur, it might take more than one application of this process to solve all the problems.
Upvotes: 0
Reputation: 522712
Just use lapply
, which can apply a function to every element of a list, returning a list in the process:
numberOfProducts <- 2
result <- lapply(splitted, function(x) getCustomer(x, numberOfProducts))
Edit:
It looks like your function has logic which sometimes can result in a data frame with no rows. In this case, you may check for an empty data frame and return NA
:
output <- data.frame(Customer.ID, Gender,Segment, Net_Discount, Number_of_Discounts, Sales,
Profit, Products, Products_Category, Products_Order_Date, lon, lat)
return(ifelse(nrow(output) > 0, output[1,], NA))
Upvotes: 1