Reputation: 1
I have a dataframe:
value
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"
How could i separate column "value" into 6 columns, defined by timestamp and parenthesis? Desired result must look like this:
timestamp col2 col3 col4 col5 message
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"
dput:
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Views: 28
Reputation: 388862
You can use tidyr
's extract
and provide a pattern to extract for each column value.
tidyr::extract(df, value,
c('timestamp', paste0('col', 2:5), 'message'),
'(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')
# timestamp col2 col3 col4 col5
#1 2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT]
# message
#1 increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3 new set: id = 12442, user = msmith
timestamp
- extract numbers that follow the pattern num-num-num num:num:num:num
col2
- extract all the following uppercase text
col3
- extracts value in <.*>
col4
- extracts value in {.*}
col5
- extracts value in [.*]
col6
- all the remaining text.
data
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl",
"2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test",
"2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2