Reputation: 159
Given the following df:
data = {'Description': ['with lemon', 'lemon', 'and orange', 'orange'],
'Start': ['6', '1', '5', '1'],
'Length': ['5', '5', '6', '6']}
df = pd.DataFrame(data)
print (df)
I would like to substring the "Description" based on what is specified in the other columns as start and length, here the expected output:
data = {'Description': ['with lemon', 'lemon', 'and orange', 'orange'],
'Start': ['6', '1', '5', '1'],
'Length': ['5', '5', '6', '6'],
'Res': ['lemon', 'lemon', 'orange', 'orange']}
df = pd.DataFrame(data)
print (df)
Is there a way to make it dynamic or another compact way?
df['Res'] = df['Description'].str[1:2]
Upvotes: 2
Views: 1714
Reputation: 1
You can use .map
to cycle through the Series. Use split(' ')
to separate the words if there is space and get the last word in the list [-1]
.
df['RES'] = df['Description'].map(lambda x: x.split(' ')[-1])
Upvotes: 0
Reputation: 260490
You need to loop, a list comprehension will be the most efficient (python ≥3.8 due to the walrus operator, thanks @I'mahdi):
df['Res'] = [s[(start:=int(a)-1):start+int(b)] for (s,a,b)
in zip(df['Description'], df['Start'], df['Length'])]
Or using pandas for the conversion (thanks @DaniMesejo):
df['Res'] = [s[a:a+b] for (s,a,b) in
zip(df['Description'],
df['Start'].astype(int)-1,
df['Length'].astype(int))]
output:
Description Start Length Res
0 with lemon 6 5 lemon
1 lemon 1 5 lemon
2 and orange 5 6 orange
3 orange 1 6 orange
df['Res'] = [s[a:a+b] if pd.notna(a) and pd.notna(b) else 'NA'
for (s,a,b) in
zip(df['Description'],
pd.to_numeric(df['Start'], errors='coerce').convert_dtypes()-1,
pd.to_numeric(df['Length'], errors='coerce').convert_dtypes()
)]
output:
Description Start Length Res
0 with lemon 6 5 lemon
1 lemon 1 5 lemon
2 and orange 5 6 orange
3 orange 1 6 orange
4 pinapple xxx NA NA NA
5 orangiie NA NA NA
Upvotes: 3
Reputation: 521103
Given that the fruit name of interest always seems to be the final word in the description column, you might be able to use a regex extract approach here.
data["Res"] = data["Description"].str.extract(r'(\w+)$')
Upvotes: 0