Reputation: 419
I have a dataframe containing 2 parts of string in column B, extracted with regex from column A:
df['B'] = df['A'].str.findall(r'([S][\d]|[V][\d]{3})')
A B
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 ['S1', 'V087']
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 ['S1', 'V023']
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 ['S1', 'V155']
I would like to get rid of lists in column B and join two strings within with '_'
Result would look like this:
A B
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 S1_V087
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 S1_V023
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 S1_V155
Another thing i want to extract with regex from column A is this part of string looking like this:
I have no idea how the regex would look!
A C
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 S1_1785984
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 S1_5896589
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 S1_2541236
Sorry for the double question, i would appreciate your help!
Upvotes: 1
Views: 54
Reputation: 82785
Use: str.join("_")
Ex:
df['B'] = df['B'].str.join("_")
print(df['B'])
Output:
0 S1_V087
1 S1_V023
2 S1_V155
Name: B, dtype: object
To extract content using regex
df['C'] = "S1_" + df['A'].str.extract("(\d+)_\d+$")
print(df['C'])
Output:
0 S1_1785984
1 S1_5896589
2 S1_2541236
Name: C, dtype: object
Upvotes: 4
Reputation: 4618
First one you just have to apply '_'.join in B:
df['B'] = df['B'].apply('_'.join)
Second, you don't need regex, just split by '_' and get the values you need and then join again:
df['C'] = df['A'].apply(lambda x: '_'.join([x.split('_')[4], x.split('_')[-2]]))
Upvotes: 2