Reputation: 398

Rename Columns in Python using Regular Expressions

I have a data set that has columns for number of units sold in a given month - the problem being that the monthly units columns are named in MM/yyyy format, meaning that I have 12 columns of units information per record.

So for instance, my data looks like:

ProductID  |  CustomerID  |  04/2018  |  03/2018  |  02/2018  |  FileDate  |
a1032      |  c1576       |     36    |     12    |     19    | 04/20/2018 |

What causes this to be problematic is that a new file comes in every month, with the same file name, but different column headers for the units information based on the last 12 months.

What I would like to do, is rename the monthly units columns to Month1, Month2, Month3... based on a simple regex such as ([0-9]*)/([0-9]*) that will result in the output:

ProductID  |  CustomerID  |   Month1  |   Month2  |   Month3  |  FileDate  |
a1032      |  c1576       |     36    |     12    |     19    | 04/20/2018 |

I know that this should be possible using Python, but as I have never used Python before (I am an old .Net developer) I honestly have no idea how to achieve this.

I have done a bit of research on renaming columns in Python, but none of them mentioned pattern matching to rename a column, eg:

 df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})

UPDATE: The data that I am showing in my example is only a subset of the columns; total, in my data set I have 120 columns, only 12 of which need to be renamed, this is why I thought that regex might be the simplest way to go.

Upvotes: 0

Answers (3)

ujhuyz0110

Reputation: 383

Since rename could take a function as a mapper, we could define a customized function which returns a new column name in the new format if the old column name matches regex; otherwise, returns the same column name. For example,

import re


def mapper(old_name):
    match = re.match(r'([0-9]*)/([0-9]*)', old_name)
    if match:
        return 'Month{}'.format(int(match.group(1)))
    return old_name

df = df.rename(columns=mapper)

Upvotes: 1

Tyler K

Reputation: 338

import re

# regex pattern
pattern = re.compile("([0-9]*)/([0-9]*)")

# get headers as list
headers = list(df)

# apply regex
months = 1
for index, header in enumerate(headers):
    if pattern.match(header):
        headers[index] = 'Month{}'.format(months)
        months += 1

# set new list as column headers
df.columns = headers

Upvotes: 1

Acccumulation

Reputation: 3591

If you have some set names that you want to convert to, then rather than using rename, it might easier to just pass a new list to the df.columns attribute

df.columns = ['ProductID','CustomerID']+['Month{}'.format(i) for i in range(12)]+['FileDate']

If you want to use rename, if you can write a function find_new_name that does the conversion you want for a single name, you can rename an entire list old_names with

df.rename(columns = {oldname:find_new_name(old_name) for old_name in old_names})

Or if you have a function that takes a new name and figures out what old name corresponds to it, then it would be

df.rename(columns = {find_old_name(new_name):new_name for new_name in new_names})

You can also do

for new_name in new_names:
    old_name = find_new_name(old_name)
    df[new_name] = df[old_name]

This will copy the data into new columns with the new names rather than renaming, so you can then subset to just the columns you want.

Upvotes: 1

Rename Columns in Python using Regular Expressions

Answers (3)

Related Questions