Sia
Sia

Reputation: 37

How to write a for in for loop with nrow?

I have a dataset called "trip", including 900,000 records, showing trips. I have a column called "ID", which shows the person ID for an individual. However, here is the point. One individual might have 1 trip, so there is just one record for that ID, but another person might have 7 trips, resulting in 7 rows (with the same ID). Then, I have a column called "transport mode", which can have the values of 1 (for car), 2 (for public transport), 3 (for walk), and 4 (for bike) showing different transport options. Here are my variables:

ID: c(30001, 30002, 30002, 30002, 30002, 30002, 30002, 30002)

Then, I have a column called Transport_mode, relating to that IDs (trips)

Transport_mode : c(1, 2, 4, 3, 2, 1, 2, 1)

So, I made an empty variable called "public_fr" to show the frequency of public transport trips. I want to write a for loop that counts the number of public transport mode for any ID. So, I tried the following:

for (i in 1:nrow(trip))
   {for (j in 1:nrow(trip$ID))
     {if (as.character(trip$Transport_mode[j] == 2)) (trip$public_fr[j] <- trip$public_fr[j] + 1)}

This code should give me:

public_fr: c(0, 3)

0: because ID = 30001 has no public transit trip (look at transport mode 1), and 3: because ID = 30002 has three public transport trips (because there are three 2s).However, it does not work. It gives the error of:

Error in 1:nrow(trip$ID) : argument of length 0

Can you help me with that? if there is a similar question answer, please bring the link. Thanks.

Upvotes: 0

Views: 1431

Answers (3)

Gregor Thomas
Gregor Thomas

Reputation: 145835

Your error is because trip$ID is just a vector, and vectors don't have rows. nrow(trip$ID) will give NULL, and 1:NULL gives the error that you see.

More generally, a for loop is a bad way to do this. There are many good ways to do things "by group" in a data frame, base::aggregate, or the dplyr and data.table packages, for example. Here's a dplyr version of your code:

library(dplyr)
trip %>%
  group_by(ID) %>%
  summarize(public_fr = sum(Transport_mode == 2))

In general, sum(condition) is a good way to count the number of times a condition is met, like sum(Transport_mode == 2) in this case.

If you really want to use a for loop (you shouldn't---it is harder to write and much slower), you should loop over unique ID values, not over each row:

uid = unique(trip$ID)
public_fr = integer(length(uid))
for(i in 1:length(uid)) {
  public_fr[i] = sum(trip[trip$ID == uid[i], "Transport_mode"] == 2)
}

The above loop looks at each unique ID, pull the Transport_mode values corresponding to that ID, and uses the sum trick to count the 2s. But in R, this is a bad way to go. aggregate, dplyr, or data.table are much better.

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269704

If trip is given by the code in the Note at the end then this gives a table of counts of ID vs. mode:

table(trip)

giving:

       Transport_mode
ID      1 2 3 4
  30001 1 0 0 0
  30002 1 3 1 1

Note

trip <- data.frame(
  ID = c(30001, 30002, 30002, 30002, 30002, 30002, 30002),
  Transport_mode = c(1, 2, 4, 3, 2, 1, 2))

Upvotes: 2

G5W
G5W

Reputation: 37641

You can do this in base R using aggregate.

aggregate(trip$Transport_mode == 2, list(trip$ID), sum)$x
[1] 0 3

Upvotes: 3

Related Questions