user394323
user394323

Reputation: 51

how to separate a column into multiple columns and change the results from characters to numbers

enter image description here

##id## ##initiativen##

1 abc 2a 2 cde 2b 3 efd a 4 geh c 5 jytd 5v 6 jydjytd e

Hello, I have something similar to this, just a lot bigger and I was wondering which is the most efficient way to divide the column initiativen into two columns, one containing the numbers (2,2,5,4) and one containing the letters or the blank space. it has to be a general formula as the data frame I need to apply it too is quite big. The letters correspond to a particular initiative number but the first initiative number is not indicated and "a" correspond to initiative number 2.

I would love it to look like something like that with the letters substituted by numbers (blank=1, a=2, b=3 etc..)

id    initiativen question
abc        2         2
cde        3         2
efd        2         N/A
geh        4         N/A
jytd       23        5
jydjytd    6         N/A
bfdhslbf   1         3

I have tried to use "separate" but it doesn't really work and doesn't solve the problem of the first initiative having no corresponding letter. Any help or suggestion would be extremely welcomed and helpful.

Thank you so much:)

Upvotes: 1

Views: 64

Answers (3)

LN_P
LN_P

Reputation: 1488

For the second column you can use regular expression to only keep numeric values:

df$initiativen <- gsub("[^0-9]", "", df$initiativen)

Upvotes: 0

s_baldur
s_baldur

Reputation: 33498

Using data.table

# Step one
    setDT(df)
df[, ":="(
      question  = gsub("[a-z]", "", initiativen),
      initiativen = match(gsub("[0-9]", "", initiativen), letters, nomatch = 0) + 1L
    )
   ]
df
        id initiativen question
1:     abc           2        2
2:     cde           3        2
3:     efd           2         
4:     geh           4         
5:    jytd          23        5
6: jydjytd           6         
7: vbdjfkb           1        4

# Then some tidying
df[, question := ifelse(nzchar(question), question, NA)]

df
        id initiativen question
1:     abc           2        2
2:     cde           3        2
3:     efd           2     <NA>
4:     geh           4     <NA>
5:    jytd          23        5
6: jydjytd           6     <NA>
7: vbdjfkb           1        4

Data

df <- data.frame(
  id = c("abc", "cde", "efd", "geh", "jytd", "jydjytd", "vbdjfkb"),
  initiativen = c("2a", "2b", "a", "c", "5v", "e", "4"),
  stringsAsFactors = FALSE
)

Edit

Can also be done in one step:

df[, question := gsub("[a-z]", "", initiativen)
   ][, ":="(
      question = ifelse(nzchar(question), question, NA),
      initiativen = match(gsub("[0-9]", "", initiativen), letters, nomatch = 0) + 1L
    )
   ]

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50668

How about the following tidyverse solution?

library(tidyverse);
df %>%
    separate(initiativen, into = c("p1", "p2"), sep = "(?<=[0-9])(?=[a-z])") %>%
    mutate(
        initiativen = case_when(
            str_detect(p1, "[a-z]") ~ p1,
            str_detect(p2, "[a-z]") ~ p2),
        question = case_when(
            str_detect(p1, "[0-9]") ~ p1,
            str_detect(p2, "[0-9]") ~ p2)) %>%
    mutate(initiativen = ifelse(is.na(initiativen), 1, match(initiativen, letters) + 1)) %>%
    select(-p1, -p2)
#       id initiativen question
#1     abc           2        2
#2     cde           3        2
#3     efd           2     <NA>
#4     geh           4     <NA>
#5    jytd          23        5
#6 jydjytd           6     <NA>
#7 vbdjfkb           1        4

Note that the warning can be safely ignored as it stems from the missing fields when separateing.

Explanation: We use a positive look-behind and look-ahead to split entries in initiativen into two parts p1 and p2; we then fill initiativen and question with entries from p1 or p2 depending on whether they contain a number "[0-9]" or a character "[a-z]"; convert characters to numbers with match(initiativen, letters) and finally clean the data.frame.


Sample data

df <- read.table(text =
    "       id initiativen
1     abc          2a
2     cde          2b
3     efd           a
4     geh           c
5    jytd          5v
6 jydjytd           e
7 vbdjfkb          4", row.names = 1)

Upvotes: 1

Related Questions