Separate column in dataframe into several other columns

Question

I have a dataframe:

                     value
2020-11-20 09:10:28:005 DEBUG  {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG  {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO  {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"

How could i separate column "value" into 6 columns, defined by timestamp and parenthesis? Desired result must look like this:

        timestamp           col2      col3          col4               col5              message
2020-11-20 09:10:28:005    DEBUG           {EVENT-upload}     [Item_create]      increase values: user = "jbohl"
2020-11-20 09:11:10:055    DEBUG           {EVENT-upload}     [Item_create]      redirect: user = "msmith". limit test
2020-11-20 09:10:28:174    INFO            {EVENT-upload}     [INPUT]            new set: id = 12442, user = "msmith"

dput:

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG  {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG  {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO  {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L))

Ronak Shah · Accepted Answer

You can use tidyr's extract and provide a pattern to extract for each column value.

tidyr::extract(df, value,
               c('timestamp', paste0('col', 2:5), 'message'), 
               '(\d+-\d+-\d+ \d+:\d+:\d+:\d+)\s*([A-Z]+)\s*(<.*?>)\s*({.*?})\s*($$.*?$$)\s*(.*)')

#               timestamp  col2   col3           col4          col5
#1 2020-11-20 09:10:28:005 DEBUG  {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG  {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174  INFO  {EVENT-upload}       [INPUT]

#                              message
#1       increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3  new set: id = 12442, user = msmith

timestamp - extract numbers that follow the pattern num-num-num num:num:num:num

col2 - extract all the following uppercase text

col3 - extracts value in <.*>

col4 - extracts value in {.*}

col5 - extracts value in [.*]

col6 - all the remaining text.

data

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG  {EVENT-upload} [Item_create] increase values: user = jbohl", 
"2020-11-20 09:11:10:055 DEBUG  {EVENT-upload} [Item_create] redirect: user = msmith. limit test", 
"2020-11-20 09:10:28:174 INFO  {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))

Separate column in dataframe into several other columns

Answers (1)

Related Questions