datayoda
datayoda

Reputation: 1087

adding a column based on other values

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. I'd like to add a column with values depending on the evaluation of this function:

isType <- function(Impressions, Clicks)
{ 
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}

so far so good. I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't.

# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)

input data:

Keywords,Impressions,Clicks
"Hello",0,0
"World",1,0
"R",34,23

Wanted output:

Keywords,Impressions,Clicks,Type
"Hello",0,0,"ZeroImp"
"World",1,0,"NoClicks"
"R",34,23,"HasClicks"

Upvotes: 4

Views: 8338

Answers (3)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37734

This is a case where arithmetic can be cleaner and most likely faster than nested ifelse statements.

Again building on Joshua's solution:

Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
                    levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))

Upvotes: 0

Charles
Charles

Reputation: 4469

Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...)

Mydf$Type = ifelse(Mydf$Impressions >= 1,
    ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')

Upvotes: 10

Joshua Ulrich
Joshua Ulrich

Reputation: 176638

First, the if/else block in your function will return the warning:

Warning message:
In if (1:2 > 2:3) TRUE else FALSE :
the condition has length > 1 and only the first element will be used

which explains why it all the rows are the same.

Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. I imagine this is causing your long run-times.

EDIT: My shared code. I'd love for someone to provide a more elegant solution.

Mydf <- data.frame(
  Keywords = sample(c("Hello","World","R"),20,TRUE),
  Impressions = sample(0:3,20,TRUE),
  Clicks = sample(0:3,20,TRUE) )

Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
  "HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
  "NoClicks", Mydf$Type)

Upvotes: 3

Related Questions