Reputation: 37
I have a dataset called "trip", including 900,000 records, showing trips. I have a column called "ID", which shows the person ID for an individual. However, here is the point. One individual might have 1 trip, so there is just one record for that ID, but another person might have 7 trips, resulting in 7 rows (with the same ID). Then, I have a column called "transport mode", which can have the values of 1 (for car), 2 (for public transport), 3 (for walk), and 4 (for bike) showing different transport options. Here are my variables:
ID: c(30001, 30002, 30002, 30002, 30002, 30002, 30002, 30002)
Then, I have a column called Transport_mode, relating to that IDs (trips)
Transport_mode : c(1, 2, 4, 3, 2, 1, 2, 1)
So, I made an empty variable called "public_fr" to show the frequency of public transport trips. I want to write a for loop that counts the number of public transport mode for any ID. So, I tried the following:
for (i in 1:nrow(trip))
{for (j in 1:nrow(trip$ID))
{if (as.character(trip$Transport_mode[j] == 2)) (trip$public_fr[j] <- trip$public_fr[j] + 1)}
This code should give me:
public_fr: c(0, 3)
0: because ID = 30001 has no public transit trip (look at transport mode 1), and 3: because ID = 30002 has three public transport trips (because there are three 2s).However, it does not work. It gives the error of:
Error in 1:nrow(trip$ID) : argument of length 0
Can you help me with that? if there is a similar question answer, please bring the link. Thanks.
Upvotes: 0
Views: 1431
Reputation: 145835
Your error is because trip$ID
is just a vector, and vectors don't have rows. nrow(trip$ID)
will give NULL
, and 1:NULL
gives the error that you see.
More generally, a for
loop is a bad way to do this. There are many good ways to do things "by group" in a data frame, base::aggregate
, or the dplyr
and data.table
packages, for example. Here's a dplyr
version of your code:
library(dplyr)
trip %>%
group_by(ID) %>%
summarize(public_fr = sum(Transport_mode == 2))
In general, sum(condition)
is a good way to count the number of times a condition is met, like sum(Transport_mode == 2)
in this case.
If you really want to use a for
loop (you shouldn't---it is harder to write and much slower), you should loop over unique ID values, not over each row:
uid = unique(trip$ID)
public_fr = integer(length(uid))
for(i in 1:length(uid)) {
public_fr[i] = sum(trip[trip$ID == uid[i], "Transport_mode"] == 2)
}
The above loop looks at each unique ID
, pull the Transport_mode values corresponding to that ID, and uses the sum
trick to count the 2s. But in R, this is a bad way to go. aggregate
, dplyr
, or data.table
are much better.
Upvotes: 1
Reputation: 269704
If trip
is given by the code in the Note at the end then this gives a table of counts of ID vs. mode:
table(trip)
giving:
Transport_mode
ID 1 2 3 4
30001 1 0 0 0
30002 1 3 1 1
trip <- data.frame(
ID = c(30001, 30002, 30002, 30002, 30002, 30002, 30002),
Transport_mode = c(1, 2, 4, 3, 2, 1, 2))
Upvotes: 2
Reputation: 37641
You can do this in base R using aggregate
.
aggregate(trip$Transport_mode == 2, list(trip$ID), sum)$x
[1] 0 3
Upvotes: 3