Ktal
Ktal

Reputation: 25

Rename the columns after split in r

Hi I am aware that there are similar questions, but the solutions did not seem to address my problem, so I wonder if anyone may help.

I have a large data frame, inside which there is a column like this:

result
A, B-C
A, C-D
E, F-G
...

I managed to split the column into three using:

df$new_result <- str_match(df$result, "^(.*),(.*)-(.*)$")[,-1]

Now part of the data frame looks like:

result    new_result.1    new_result.2    new_result.3
A, B-C        A               B                C
A, C-D        A               C                D
E, F-G        E               F                G
...    

However, when I tried to call:

df$new_result.1

R gave me an error stating that "new_result.1" could not be found.

I have tried the following but none of them worked.

with(df, colsplit(df$result, pattern = "^(.*),(.*)-(.*)$", names = c('a', 'b', 'c')))

OR

names(df)[names(df) == 'new_result.1'] <- 'a'

OR

setNames(df, c(...,'a','b','c',...))

I think the problem is that "new_result.1", "new_result.2", "new_result.3" cannot be found in the data frame, instead, they are referred together as "new_result". Any idea how can I separate them so that later I can refer to the columns individually? Thanks!

Upvotes: 1

Views: 2081

Answers (2)

zhang jing
zhang jing

Reputation: 181

Please try this

install.packages("do")
library(do)
df2=Replace(data = df,pattern = '-:,')
col_split(df2$result,',')

Upvotes: 0

Heroka
Heroka

Reputation: 13139

Following your approach, when we look at 'str(df)' we get this:

> str(df)
'data.frame':   3 obs. of  2 variables:
 $ result    : chr  "A, B-C" "A, C-D" "E, F-G"
 $ new_result: chr [1:3, 1:3] "A" "A" "E" " B" ...

Which is not surprising, as str_match returns a matrix.

An approach to fix this is the following:

Create a 'splitted' dataframe with relevant column names

splitted <- data.frame(str_match(df$result, "^(.*),(.*)-(.*)$")[,-1],
                       stringsAsFactors=F)
colnames(splitted) <- paste0("new_result.",1:ncol(splitted))

And cbind everything together

df <- cbind(df,splitted)
> str(df)
'data.frame':   3 obs. of  4 variables:
 $ result      : chr  "A, B-C" "A, C-D" "E, F-G"
 $ new_result.1: chr  "A" "A" "E"
 $ new_result.2: chr  " B" " C" " F"
 $ new_result.3: chr  "C" "D" "G"

Upvotes: 3

Related Questions