Reputation: 211
I am trying to split a pandas column repeatedly. I want to find the string in-between two strings, indefinitely. For example, lets say i have the pandas column from the input below:
import numpy as np
import pandas as pd
data=np.array([["'abc'ad32kn'def'dfannasfl[]12a'ghi'"],
["'jk'adf%#d1asn'lm'dfas923231sassda"],
["'nop'ad&@*-0'qrs'd2&*@^#!!sda'tuv'dasdj_23'w'823a&@'xyz'adfa"]])
df = pd.DataFrame({'Practice Column': data.ravel()})
print(df)
I then, would like to split these string by opening and closing quotes '...'
, and then take what is inside. So, my final output would be:
Can someone help me out? Thanks.
Upvotes: 2
Views: 127
Reputation: 402483
Let's use extractall
here:
df['Practice Column'].str.extractall(r"'(.*?)'").unstack(1)[0].fillna('')
match 0 1 2 3 4
0 abc def ghi
1 jk lm
2 nop qrs tuv w xyz
The pattern '(.*?)'
finds all instances of strings within the single quotes. More info -
' # Match opening quote
( # Open capture group
.*? # Non-greedy match for anything
) # End of capture group
' # Match closing quote
To merge this back with df
, you can either use join
:
v = df.join(df['Practice Column']
.str.extractall(r"'(.*?)'").unstack(1)[0].fillna(''))
Or, assign "Practice Column" back:
v = df['Practice Column'].str.extractall(r"'(.*?)'").unstack(1)[0].fillna('')
v.insert(0, 'Practice Column', df['Practice Column'])
print(v)
match Practice Column 0 1 2 3 4
a 'abc'ad32kn'def'dfannasfl[]12a'ghi' abc def ghi
b 'jk'adf%#d1asn'lm'dfas923231sassda jk lm
c 'nop'ad&@*-0'qrs'd2&*@^#!!sda'tuv'dasdj_23'w'8... nop qrs tuv w xyz
Another solution with a list comprehension (for performance).
import re
p = re.compile("'(.*?)'")
pd.DataFrame([
p.findall(s) for s in df['Practice Column']]).fillna('')
0 1 2 3 4
0 abc def ghi
1 jk lm
2 nop qrs tuv w xyz
This won't work if there are NaNs, so here's a modified version of the solution above. You will need to drop the NaNs first.
pd.DataFrame([
p.findall(s) for s in df['Practice Column'].dropna()]
).fillna('')
0 1 2 3 4
0 abc def ghi
1 jk lm
2 nop qrs tuv w xyz
Upvotes: 3