Reputation: 25
Hi I am aware that there are similar questions, but the solutions did not seem to address my problem, so I wonder if anyone may help.
I have a large data frame, inside which there is a column like this:
result
A, B-C
A, C-D
E, F-G
...
I managed to split the column into three using:
df$new_result <- str_match(df$result, "^(.*),(.*)-(.*)$")[,-1]
Now part of the data frame looks like:
result new_result.1 new_result.2 new_result.3
A, B-C A B C
A, C-D A C D
E, F-G E F G
...
However, when I tried to call:
df$new_result.1
R gave me an error stating that "new_result.1" could not be found.
I have tried the following but none of them worked.
with(df, colsplit(df$result, pattern = "^(.*),(.*)-(.*)$", names = c('a', 'b', 'c')))
OR
names(df)[names(df) == 'new_result.1'] <- 'a'
OR
setNames(df, c(...,'a','b','c',...))
I think the problem is that "new_result.1", "new_result.2", "new_result.3" cannot be found in the data frame, instead, they are referred together as "new_result". Any idea how can I separate them so that later I can refer to the columns individually? Thanks!
Upvotes: 1
Views: 2081
Reputation: 181
Please try this
install.packages("do")
library(do)
df2=Replace(data = df,pattern = '-:,')
col_split(df2$result,',')
Upvotes: 0
Reputation: 13139
Following your approach, when we look at 'str(df)' we get this:
> str(df)
'data.frame': 3 obs. of 2 variables:
$ result : chr "A, B-C" "A, C-D" "E, F-G"
$ new_result: chr [1:3, 1:3] "A" "A" "E" " B" ...
Which is not surprising, as str_match
returns a matrix.
An approach to fix this is the following:
Create a 'splitted' dataframe with relevant column names
splitted <- data.frame(str_match(df$result, "^(.*),(.*)-(.*)$")[,-1],
stringsAsFactors=F)
colnames(splitted) <- paste0("new_result.",1:ncol(splitted))
And cbind everything together
df <- cbind(df,splitted)
> str(df)
'data.frame': 3 obs. of 4 variables:
$ result : chr "A, B-C" "A, C-D" "E, F-G"
$ new_result.1: chr "A" "A" "E"
$ new_result.2: chr " B" " C" " F"
$ new_result.3: chr "C" "D" "G"
Upvotes: 3