Reputation: 175
I have a data set:
crimes<-data.frame(x=c("Smith", "Jones"), charges=c("murder, first degree-G, manslaughter-NG", "assault-NG, larceny, second degree-G"))
I'm using tidyr:separate to split the charges column on a match with "G,"
crimes<-separate(crimes, charges, into=c("v1","v2"), sep="G,")
This splits my columns, but removes the separator "G,". I want to retain the "G," in the resulting column split.
My desired output is:
x v1 v2
Smith murder, first degree-G manslaughter-NG
Jones assault-NG larceny, second degree-G
Any suggestions welcome.
Upvotes: 12
Views: 11769
Reputation: 2965
Replace <yourRegexPattern>
with your Regex
If you want the 'sep' in the left column (look behind)
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?<=<yourRegexPattern>)")
If you want the 'sep' in the right column (look ahead)
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=<yourRegexPattern>)")
Also note that when you are trying to separate a word from a group of digits (I.E. Auguest1990
to August
and 1990
) you will need to ensure the whole pattern gets read.
Example:
dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=[[:digit:]])", extra="merge")
Upvotes: 12
Reputation: 4220
UPDATE
This is what you ask for. Keep in mind that your data is not tidy (both V1 and V2 have more than one variable inside each column)
A<-separate(crimes,charges,into=c("V1","V2"),sep = "(?<=G,)")
A
x V1 V2
1 Smith murder, first degree-G, manslaughter-NG
2 Jones assault-NG, larceny, second degree-G
An easier way to get keep the "G" or "NG" is to use sep=", "
as said by alistaire.
A<-separate(crimes, charges, into=c("v1","v2"), sep = ', ')
This gives
x v1 v2
1 Smith murder-G manslaughter-NG
2 Jones assault-NG larceny-G
If you wanted to keep separating your data.frame (using the -)
separate(A, v1, into = c("v3","v4"), sep = "-")
that gives
x v3 v4 v2
1 Smith murder G manslaughter-NG
2 Jones assault NG larceny-G
You'll need to do that again for the v2 column. I don't know if you want to keep separating, please post your expected output to make my answer more specific.
Upvotes: 7