Reputation: 137
I have the following excerpt of a dataframe df[2]:
CAR DRIVER START PIT 1 PIT 2 PIT 3 PIT 4 PIT 5
0 31 OCO In In (2) C3n (3) C2n (37) 0 0
1 5 VET In In (2) C3n (3) C2n (36) 0 0
2 44 HAM In In (2) C3u (4) C2n (19) C3n (47) 0
3 55 SAI In In (2) C3n (3) C2n (32) 0 0
4 14 ALO In In (2) C3n (3) C2n (39) 0 0
I am trying to change the "PIT.." columns (The 3rd column onwards, by inspection) values, for e.g. from "C3n (3)" to "C3n-3". I have written the following code to achieve this:
df[2].iloc[:, 3:] = df[2].iloc[:, 3:].apply(lambda x: re.compile(r'\w+').findall(str(x))
[0] + "-" + re.compile(r'\w+').findall(str(x))[1] if len(str(x)) > 1 else 0)
The output from the following code is:
CAR DRIVER START PIT 1 PIT 2 PIT 3 PIT 4 PIT 5
0 31 OCO In 0-In 0-C3n 0-C2n 0-0 0-0
1 5 VET In 0-In 0-C3n 0-C2n 0-0 0-0
2 44 HAM In 0-In 0-C3n 0-C2n 0-0 0-0
3 55 SAI In 0-In 0-C3n 0-C2n 0-0 0-0
4 14 ALO In 0-In 0-C3n 0-C2n 0-0 0-0
,which is incorrect, as all column values are the same. The regex I have implemented in the lambda is correct when tested as an independent function but not when I apply it to the dataframe.
Is this a problem with my dataframe slicing, as that is the only problem I am unsure of? Any help would be appreciated.
Upvotes: 0
Views: 76
Reputation: 120559
You can use filter
to select PIT..
columns and replace like this:
>>> df.filter(like='PIT').replace(r' \((\d+)\)', r'-\1', regex=True)
PIT 1 PIT 2 PIT 3 PIT 4 PIT 5
0 In-2 C3n-3 C2n-37 0 0
1 In-2 C3n-3 C2n-36 0 0
2 In-2 C3u-4 C2n-19 C3n-47 0
3 In-2 C3n-3 C2n-32 0 0
4 In-2 C3n-3 C2n-39 0 0
For the regex, refer to the documentation for the (\d+)
and \1
Upvotes: 6