french_fries
french_fries

Reputation: 1

Separate column in dataframe into several other columns

I have a dataframe:

                     value
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"

How could i separate column "value" into 6 columns, defined by timestamp and parenthesis? Desired result must look like this:

        timestamp           col2      col3          col4               col5              message
2020-11-20 09:10:28:005    DEBUG     <main>      {EVENT-upload}     [Item_create]      increase values: user = "jbohl"
2020-11-20 09:11:10:055    DEBUG     <main>      {EVENT-upload}     [Item_create]      redirect: user = "msmith". limit test
2020-11-20 09:10:28:174    INFO      <main>      {EVENT-upload}     [INPUT]            new set: id = 12442, user = "msmith"

dput:

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L)) 

Upvotes: 1

Views: 28

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388862

You can use tidyr's extract and provide a pattern to extract for each column value.

tidyr::extract(df, value,
               c('timestamp', paste0('col', 2:5), 'message'), 
               '(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')

#               timestamp  col2   col3           col4          col5
#1 2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174  INFO <main> {EVENT-upload}       [INPUT]

#                              message
#1       increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3  new set: id = 12442, user = msmith

timestamp - extract numbers that follow the pattern num-num-num num:num:num:num

col2 - extract all the following uppercase text

col3 - extracts value in <.*>

col4 - extracts value in {.*}

col5 - extracts value in [.*]

col6 - all the remaining text.

data

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", 
"2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", 
"2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 2

Related Questions