TheNaidge
TheNaidge

Reputation: 89

R - Creating new columns based on searching vector elements in a df

I would like to add columns to a df where the newly added columns are based on searching values of a vector in an existing column of the df.

My original dataset contains webdata where rows represent pages visited for each customer; the pages visited are stored in df$URL. I have a separate vector of web page URLS, each element in this vector needs to be added as a column with a value indicating whether that customer's page visit in the original df (df$URL) matches the to be added column (=vector element).

Basically: I want to create a column for each element of the vector (where column name = vector element) with values (0/1) based on searching the rows of the URL column of the df to add a 1 on a match, or 0 otherwise.

All of the vector elements in urlnames occur in df$URL (but not for every row), but df$URL contains more URLs than are in the vector (basically the vector contains only some top visited URL pages).

urlnames <- c("/home", "/login", "/contact")

df <- data.frame("URL" = c("/home", "/login", "/contact", "/chat", "/product-page"))

Manually I would do something like (with dplyr):

df %<>%
  mutate(home = ifelse(URL == "/home", 1, 0))

Basically the variable name and ifelse criterium should be replaced with the vector element. I don't know if there's more efficient/neat ways of doing this.

I really want to learn how to do such things automatically rather than having to do manual mutate calls for each of these variables.

(BTW I would also appreciate input with potential issues the url slashes could create in creating column names, e.g. /home as a variable)

Hope I've been clear enough to explain my issue, apologies if not - it's my first post and I'm (obviously) new to R. Thank you!

Upvotes: 0

Views: 377

Answers (3)

hello_friend
hello_friend

Reputation: 5788

Longer answer (a bit late to the party) and not as succinct, eloquent or efficient as those above but can be used for partial matches with only minor adjustments (removing the paste0 function encassing the urlnames):

setNames(as.data.frame( 
  lapply(paste0("^", urlnames, "$"), function(x){
      +Vectorize(grepl)(x, df$URL)
    }
  ), row.names = NULL), urlnames)

Upvotes: 0

nicola
nicola

Reputation: 24480

Try table:

table(1:nrow(df),df$URL)

#    /chat /contact /home /login /product-page
#  1     0        0     1      0             0
#  2     0        0     0      1             0
#  3     0        1     0      0             0
#  4     1        0     0      0             0
#  5     0        0     0      0             1

You can drop the columns you don't want afterwards and coerce to a data.frame if needed.

There are tons of ways to remove the columns. One consists of replaceing the values which are different from urlnames with NA and reapplying the above. Something like:

table(1:nrow(df),droplevels(replace(df$URL,which(!df$URL %in% urlnames),NA)))

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173813

Something like this, using lapply?

setNames(as.data.frame(lapply(urlnames, function(x) +(x==df$URL))), urlnames)
#>   /home /login /contact
#> 1     1      0        0
#> 2     0      1        0
#> 3     0      0        1
#> 4     0      0        0
#> 5     0      0        0

What happens here is that we use lapply to create a list of vectors, with one vector of each member of urlnames. Each vector is filled with 1s and 0s depending on whether the element of urlnames was found at each position in df$URL. We then turn the list into a data frame and set its column names to the urlnames

Upvotes: 1

Related Questions