Reputation: 25
I want to extract value of requirement number from a column named "Linked Projects" in a dataframe. This column "Linked Projects" contains a string in the below format:
Linked Issues
Requirement-12345, NewPr-8795, OldPr-78941
MSR-85749, Requirement-74852, NewPr-95418
Requirement-894895
OldPr-85974, NewPr-968572, Requirement-985785
Expected Result:
What I want is to store the the requirement number in a new column like below:
Requirement Number
Requirement-12345
Requirement-74852
Requirement-894895
Requirement-985785
Upvotes: 1
Views: 77
Reputation: 862396
Use Series.str.extract
for get values with regex
- r'(Requirement-\d+)'
string with integer for get first matched value per row:
df['new'] = df['Linked Issues'].str.extract(r'(Requirement-\d+)')
print (df)
Linked Issues new
0 Requirement-12345, NewPr-8795, OldPr-78941 Requirement-12345
1 MSR-85749, Requirement-74852, NewPr-95418 Requirement-74852
2 Requirement-894895 Requirement-894895
3 OldPr-85974, NewPr-968572, Requirement-985785 Requirement-985785
If possible multiple values per row use Series.str.findall
with Series.str.join
:
df['new'] = df['Linked Issues'].str.findall(r'(Requirement-\d+)').str.join(', ')
Upvotes: 1