Mus
Mus

Reputation: 7550

How do I check to see if certain columns exist and, if not, how to then create them and populate them with zeroes?

I have a data frame which I manipulate and whose final form sometimes changes shape depending on whether or not certain values exist.

Once the data frame is processed and in its final form, I wish to reorder the columns into a particular order before writing to .csv.

However, because some of the columns won't always exist, I want to know if it is possible to check which columns exist and of the ones that do, I want them to follow a particular format and the ones that don't to be created and populated with zeroes.

I have one solution which I think is very clunky and could probably be improved significantly: in this example, I am checking to see if the column taken_offline exists within my dataset. If so, I want the columns to be reordered in a certain way with this column included and if not, I want taken_offline to be created and populated with zeroes whilst still being reordered in the same way.

Ideally, I want to be able to say "here is the order that the columns should be presented in. If the column doesn't exist, I want to create it and populate it with zeroes".

I appreciate that a good way might be to take a list of column names from my data frame (users) and to then check the column names against the desired column order (listed below). However, I am unsure how to implement this idea.

How can I do it?

The output columns should be in this order:

"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"

My code (checking for the existence of taken_offline):

if("taken_offline" %in% colnames(users_final)){
  users_final <- users_final[, c(
    "date",
    "storeName",
    "firstName",
    "lastName",
    "conversation-request",
    "conversation-accepted",
    "acceptance_rate",
    "conversation-missed",
    "taken_offline",
    "conversation-already-accepted",
    "total_missed",
    "conversation-declined"
  )]
  print("Taken offline occurrences.")
} else {
  users_final$taken_offline <- 0
  users_final <- users_final[, c(
    "date",
    "storeName",
    "firstName",
    "lastName",
    "conversation-request",
    "conversation-accepted",
    "acceptance_rate",
    "conversation-missed",
    "taken_offline",
    "conversation-already-accepted",
    "total_missed",
    "conversation-declined"
  )]
  print("No taken offline occurrences.")
}

Upvotes: 2

Views: 71

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146119

Simpler version of the other answers. Thanks to R's vector recycling, this can be done in two quick lines.

Give a name to your vector of columns, say all_cols. Then, calling your data dd

# add missing columns and set them equal to 0
dd[setdiff(all_cols, names(dd)] = 0
# put columns in desired order
dd = dd[all_cols]

Worked example:

all_cols = c("date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined")

dd = data.frame("date" = "yesterday",
"storeName" = "Kwik-E-Mart",
"firstName" = "Apu")

dd[setdiff(all_cols, names(dd))] = 0
dd = dd[all_cols]
dd
#        date   storeName firstName lastName conversation-request conversation-accepted acceptance_rate
# 1 yesterday Kwik-E-Mart       Apu        0                    0                     0               0
#   conversation-missed taken_offline conversation-already-accepted total_missed conversation-declined
# 1                   0             0                             0            0                     0

Upvotes: 2

akash87
akash87

Reputation: 3994

If you have a named vector, say varname for the name and order of the data s you would like then you could use:

var_not_present <- varname[which(!(varname %in% names(s)))]
h <- data.frame(matrix(0, ncol = length(var_not_present), nrow = dim(s)[1]))
colnames(h) <- var_not_present
s_updated <- cbind(s,h)
s_updated <- s_updated[varname]

Upvotes: 1

Related Questions