Reputation: 864
I have some DataFrame:
d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)
df
fruit values
0 apple apple_1_0,peach_1_5
1 pear pear_1_3
2 peach mango_1_0,banana_1_0,pineapple_1_10
The strings in the values
column are comma separated, and I'd like the strings that contain the substring '_1_0'.
Desired output:
fruit values
0 apple apple_1_0
1 pear NaN
2 peach mango_1_0,banana_1_0
Something like this is somewhat close to what I'm trying to do but is painfully slow over ~100,000 rows:
for row in range(len(df)):
print([zero for zero in df['values'].str.split(',', expand=False)[row] if "_1_0" in zero])
['apple_1_0']
[]
['mango_1_0', 'banana_1_0']
Upvotes: 2
Views: 911
Reputation: 784908
Using findall
you may do this:
import numpy as np
import pandas as pd
d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)
df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(','.join).replace('', np.NaN)
print ( df )
fruit values
0 apple apple_1_0
1 pear NaN
2 peach mango_1_0,banana_1_0
Regex [^,]*_1_0(?=,|$)
matches a non-comma string that ends in _1_0
followed by comma or end of string.
We can use a lambda as well:
df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(lambda items: ','.join(items) if len(items) > 0 else np.NaN)
Upvotes: 1
Reputation: 28659
This is an alternative, as a list comprehension :
df["values"] = [ ",".join(entry if entry.endswith("1_0")
else ""
for entry in val.split(","))
.rstrip(",")
for val in df["values"]
]
df = df.replace({"": np.nan})
df
fruit values
0 apple apple_1_0
1 pear NaN
2 peach mango_1_0,banana_1_0
Upvotes: 1
Reputation: 2702
Straightforward solution:
import numpy as np
import pandas as pd
d = {'fruit': ['apple', 'pear', 'peach'],
'values': ['apple_1_0,peach_1_5', 'pear_1_3', 'mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)
new_data = df['values'].str.split(',')
new_data = new_data.apply(lambda lst: [elem for elem in lst if '_1_0' in elem])
new_data = new_data.str.join(",")
new_data = new_data.replace('', np.NaN)
Upvotes: 1
Reputation: 323226
Let us try explode
s = df['values'].str.split(',').explode()
df['New_values'] = s.where(s.str.endswith('_1_0')).dropna().groupby(level=0).agg(','.join)
df
Out[29]:
fruit values New_values
0 apple apple_1_0,peach_1_5 apple_1_0
1 pear pear_1_3 NaN
2 peach mango_1_0,banana_1_0,pineapple_1_10 mango_1_0,banana_1_0
Upvotes: 1