Reputation: 7550
I have a data frame which I manipulate and whose final form sometimes changes shape depending on whether or not certain values exist.
Once the data frame is processed and in its final form, I wish to reorder the columns into a particular order before writing to .csv.
However, because some of the columns won't always exist, I want to know if it is possible to check which columns exist and of the ones that do, I want them to follow a particular format and the ones that don't to be created and populated with zeroes.
I have one solution which I think is very clunky and could probably be improved significantly: in this example, I am checking to see if the column taken_offline
exists within my dataset. If so, I want the columns to be reordered in a certain way with this column included and if not, I want taken_offline
to be created and populated with zeroes whilst still being reordered in the same way.
Ideally, I want to be able to say "here is the order that the columns should be presented in. If the column doesn't exist, I want to create it and populate it with zeroes".
I appreciate that a good way might be to take a list of column names from my data frame (users
) and to then check the column names against the desired column order (listed below). However, I am unsure how to implement this idea.
How can I do it?
The output columns should be in this order:
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
My code (checking for the existence of taken_offline
):
if("taken_offline" %in% colnames(users_final)){
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("Taken offline occurrences.")
} else {
users_final$taken_offline <- 0
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("No taken offline occurrences.")
}
Upvotes: 2
Views: 71
Reputation: 146119
Simpler version of the other answers. Thanks to R's vector recycling, this can be done in two quick lines.
Give a name to your vector of columns, say all_cols
. Then, calling your data dd
# add missing columns and set them equal to 0
dd[setdiff(all_cols, names(dd)] = 0
# put columns in desired order
dd = dd[all_cols]
Worked example:
all_cols = c("date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined")
dd = data.frame("date" = "yesterday",
"storeName" = "Kwik-E-Mart",
"firstName" = "Apu")
dd[setdiff(all_cols, names(dd))] = 0
dd = dd[all_cols]
dd
# date storeName firstName lastName conversation-request conversation-accepted acceptance_rate
# 1 yesterday Kwik-E-Mart Apu 0 0 0 0
# conversation-missed taken_offline conversation-already-accepted total_missed conversation-declined
# 1 0 0 0 0 0
Upvotes: 2
Reputation: 3994
If you have a named vector, say varname
for the name and order of the data s
you would like then you could use:
var_not_present <- varname[which(!(varname %in% names(s)))]
h <- data.frame(matrix(0, ncol = length(var_not_present), nrow = dim(s)[1]))
colnames(h) <- var_not_present
s_updated <- cbind(s,h)
s_updated <- s_updated[varname]
Upvotes: 1