Super Ultra Noob
Super Ultra Noob

Reputation: 117

Transforming yearwise data using pandas

I have a dataframe that looks like this:

            Temp
Date    
1981-01-01  20.7
1981-01-02  17.9
1981-01-03  18.8
1981-01-04  14.6
1981-01-05  15.8
...     ...
1981-12-27  15.5
1981-12-28  13.3
1981-12-29  15.6
1981-12-30  15.2
1981-12-31  17.4

365 rows × 1 columns

And I want to transform It so That It looks like:

        1981    1982    1983    1984    1985    1986    1987    1988    1989    1990
0       20.7    17.0    18.4    19.5    13.3    12.9    12.3    15.3    14.3    14.8
1       17.9    15.0    15.0    17.1    15.2    13.8    13.8    14.3    17.4    13.3
2       18.8    13.5    10.9    17.1    13.1    10.6    15.3    13.5    18.5    15.6
3       14.6    15.2    11.4    12.0    12.7    12.6    15.6    15.0    16.8    14.5
4       15.8    13.0    14.8    11.0    14.6    13.7    16.2    13.6    11.5    14.3
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
360     15.5    15.3    13.9    12.2    11.5    14.6    16.2    9.5     13.3    14.0
361     13.3    16.3    11.1    12.0    10.8    14.2    14.2    12.9    11.7    13.6
362     15.6    15.8    16.1    12.6    12.0    13.2    14.3    12.9    10.4    13.5
363     15.2    17.7    20.4    16.0    16.3    11.7    13.3    14.8    14.4    15.7
364     17.4    16.3    18.0    16.4    14.4    17.2    16.7    14.1    12.7    13.0

My attempt:

groups=df.groupby(df.index.year)
keys=groups.groups.keys()
years=pd.DataFrame()
for key in keys:
    years[key]=groups.get_group(key)['Temp'].values

Question: The above code is giving me my desired output but Is there is a more efficient way of transforming this?

As I can't post the whole data because there are 3650 rows in the dataframe so you can download the csv file(60.6 kb) for testing from here

Upvotes: 3

Views: 60

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Try grabbing the year and dayofyear from the index then pivoting:

import pandas as pd
import numpy as np

# Create Random Data
dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1982-12-31"))
df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape),
                  index=dr,
                  columns=['Temp'])

# Get Year and Day of Year
df['year'] = df.index.year
df['day'] = df.index.dayofyear

# Pivot
p = df.pivot(index='day', columns='year', values='Temp')

print(p)

p:

year  1981  1982
day             
1       38    85
2       51    70
3       76    61
4       71    47
5       44    76
..     ...   ...
361     23    22
362     42    64
363     84    22
364     26    56
365     67    73

Run-Time via Timeit

import timeit

setup = '''
import pandas as pd
import numpy as np

# Create Random Data
dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1983-12-31"))
df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape),
                  index=dr,
                  columns=['Temp'])'''

pivot = '''
df['year'] = df.index.year
df['day'] = df.index.dayofyear
p = df.pivot(index='day', columns='year', values='Temp')'''

groupby_for = '''
groups=df.groupby(df.index.year)
keys=groups.groups.keys()
years=pd.DataFrame()
for key in keys:
    years[key]=groups.get_group(key)['Temp'].values'''

if __name__ == '__main__':
    print("Pivot")
    print(timeit.timeit(setup=setup, stmt=pivot, number=1000))
    print("Groupby For")
    print(timeit.timeit(setup=setup, stmt=groupby_for, number=1000))
Pivot
1.598973
Groupby For
2.3967995999999996

*Additional note, the groupby for option will not work for leap years as it will not be able to handle 1984 being 366 days instead of 365. Pivot will work regardless.

Upvotes: 2

Related Questions