Silvia
Silvia

Reputation: 405

Concatenate data according to a partial numeric match

i have two data frames.

one is structured like this:

  code.  name.  
  1111   A B  
  1122   C D
  2122   C D
  2133   G H

the other is:

 code_2.  name.  
  11       F
  21       G

i want to obtain a third df that, in relation to code match, concatenate my data present in the first data frame, using a "OR" separator. The code value that I want to mantain is the the one of the second df. It is important that the match among code values would be made on the first and second number of the code belonging to the first dataframe.

 code.     name.  
  11     A B OR C D
  21     C D OR G H

thank you for your suggestions!

Upvotes: 1

Views: 55

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

If you prefer tidyverse, you can try something like:

df %>%
 group_by(code. = str_extract(as.character(code.), "^.{2}")) %>%
 summarise(name. = paste(name., collapse = " OR "))

  code. name.     
  <chr> <chr>     
1 11    A B OR C D
2 21    C D OR G H

It groups by the first two elements from "code." and then combines the "name." column based on those elements.

Or the same using sub():

df %>%
 group_by(code. = sub("^(.{2}).*", "\\1", as.character(code.))) %>%
 summarise(name. = paste(name., collapse = " OR "))

Or the same using substring():

df %>%
 group_by(code. = substring(as.character(code.), 1, 2)) %>%
 summarise(name. = paste(name., collapse = " OR "))

Upvotes: 2

Sotos
Sotos

Reputation: 51592

You can use aggregate, i.e.

aggregate(name. ~ substr(code., 1, 2), df, paste, collapse = ' OR ')
#  substr(code., 1, 2)      name.
#1                  11 A B OR C D
#2                  21 C D OR G H

You can take care of the column names as usual.

Upvotes: 3

Related Questions