Vincenzoalfano
Vincenzoalfano

Reputation: 162

How to use the diff function of R on a column of a dataframe conditional on having a particular value in a different column

The dataset I am working with has average riderships of different kinds of public transportation and in different years. I am interested in creating a new column showing the increase of the average ridership from the year before for each type of public transportation. The code I tried to use is the following:

for (i in 1:length(public_trans$type_of_public_transport)) {
  if (public_trans$type_of_public_transport[i] == public_trans$type_of_public_transport[i+1]) {
    ridership_diff[i] <- ifelse(public_trans$average_ridership == 0, 0, public_trans$average_ridership[i+1] - public_trans$average_ridership[i])
    next}}

The output I get running the code is this: "Error in if (public_trans$type_of_public_transport[i] == public_trans$type_of_public_transport[i + : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)"

By changing the start of the loop from "1:length(public_trans$type_of_public_transport))" to "0:length(public_trans$type_of_public_transport))", the output error becomes: "Error in if (public_trans$type_of_public_transport[i] == public_trans$type_of_public_transport[i + : argument is of length zero"

Also, even if my code worked, I'm pretty sure that there is an easier and more direct way to obtain the result I want.

Upvotes: 2

Views: 54

Answers (1)

akrun
akrun

Reputation: 887153

The issue happens when the loop reaches the last row and i + 1 doesn't have an entry. The OP also mentioned about starting the index from 0, but R index starts from 1. An option is to loop until the last row

for (i in 1:(length(public_trans$type_of_public_transport) - 1)) {
    if (public_trans$type_of_public_transport[i] == 
         public_trans$type_of_public_transport[i+1]) {
 ridership_diff[i] <- ifelse(public_trans$average_ridership[i] == 0, 0, 
      public_trans$average_ridership[i+1] - public_trans$average_ridership[i])
}}

ifelse is vectorized, so we don't need a loop here

ridership_diff <- with(public_trans,  ifelse(type_of_public_transport[-1] == type_of_public_transport[-nrow(public_trans)] & average_ridership[-nrow(publlic_trans)] == 0,
        0, average_ridership[-1] -average_ridership[-nrow(public_trans)])) 

Upvotes: 1

Related Questions