Fold_In_The_Cheese
Fold_In_The_Cheese

Reputation: 61

Replacing certain values in a column to match by another column

I am new to Python so apologies if it comes across as a really easy question. I have the following example dataframe below (the real one has thousands of rows)

df = {'City': ['London','Tokyo','London','Paris','Paris','London','Tokyo','Tokyo', 'Paris','Berlin','Berlin','Berlin'], 
      'Code': ['367','812','367','964','964','BN611','812','Y366','Z167','L715','412','L715']}
df = pd.DataFrame(data=df)

What I am trying to do is substitute all the instances where the code for a city starts with a letter with its numeric equivalent (for example, the code for each line where London appears should always be 367 instead of 367 and BN611).

Is there a way to do this easily without having to write a code for each city separately? The dataframe I have contains a few hundred more cities and I will need to apply this to every one of them. The codes for each city are always the same (for example London always has the code 367 and BN611, etc.)

Thank you

Upvotes: 2

Views: 125

Answers (1)

mozway
mozway

Reputation: 261015

You can mask the invalid codes and use a groupby.transform to replace with the valid ones:

# which codes start with a letter?
m = df['Code'].str.match(r'[A-Z]')
# mask them and replace each code by the first valid one per city
df['Code'] = df['Code'].mask(m).groupby(df['City']).transform('first')

output (as new column "Code2" for clarity):

      City   Code Code2
0   London    367   367
1    Tokyo    812   812
2   London    367   367
3    Paris    964   964
4    Paris    964   964
5   London  BN611   367
6    Tokyo    812   812
7    Tokyo   Y366   812
8    Paris   Z167   964
9   Berlin   L715   412
10  Berlin    412   412
11  Berlin   L715   412

Upvotes: 2

Related Questions