astroboy
astroboy

Reputation: 197

how to create month and year columns using regex and pandas

Hello Stack overflow Community

I've got the Data Frame here

code        sum of August 
AA             1000         
BB             4000           
CC             72262          

So there are two columns ['code','sum of August']

I've to convert this dataFrame into ['month', 'year', 'code', 'sum of August'] columns

month    year    code    sum of August
   8     2020     AA      1000
   8     2020     BB      4000
   8     2020     CC      72262

So the ['sum of August'] column sometimes named as just ['August'] or ['august']. Also sometimes, it can be ['sum of November'] or ['November'] or ['november'].

I thought of using regex to extract the month name and covert to month number.

Can anyone please help me with this?

Thanks in advance!

Upvotes: 2

Views: 153

Answers (2)

Billy Bonaros
Billy Bonaros

Reputation: 1721

You can do the following:

month = {1:'janauary',
2:'february',
3:'march',
4:'april',
5:'may',
6:'june',
7:'july',
8:'august',
9:'september',
10:'october',
11:'november',
12:'december'}

Let's say your data frame is called df. Then you can create the column month automatically using the following:

df['month']=[i for i,j in month.items() if j in str.lower(" ".join(df.columns))][0]


  code  sum of August  month
0   AA           1000      8
1   BB           4000      8
2   CC          72262      8

That means that if a month's name exists in the column names in any way, return the number of this month.

Upvotes: 6

hamstercoding
hamstercoding

Reputation: 17

It looks like you're trying to convert month names to their numbers, and the columns can be uppercse or lowercase. This might work:

months = ['january','febuary','march','april','may','june','july','august','september','october','november','december']
monthNum = []#If you're using a list, just to make this run
sumOfMonths = ['sum of august','sum of NovemBer']#Just to show functionality
for sumOfMonth in sumOfMonths:
  for idx, month in enumerate(months):
    if month in sumOfMonth.lower():#If the column month name has any of the month keywords
      monthNum.append(str(idx + 1)) #i'm just assuming that it's a list, just add the index + 1 to your variable.

I hope this helps! Of course, this wouldn't be exactly what you do, you fill in the variables and change append() if you're not using it.

Upvotes: 1

Related Questions