CJ90
CJ90

Reputation: 99

Extracting columns from a dataframe into new dataframes by string contained in column name

I have a dataframe which shows country power generation and load data (data can be downloaded from https://data.open-power-system-data.org/time_series/2019-06-05 and I am working with the 60 minute set). From this datafeame I want to extract columns related to each country in the dataset and create a new dataframe for each country and assign the dataframe the abbreviation for the respective country.

Up to now I have read in the raw data and got a list of unique countries from the column headers in the dataframe and saved them to a list called abbv.

I am trying to use the abbv list to create a dataframe for each abbreviation (each i in abbv) and populate the created dataframes with columns from the original dataframe which contain the abbreviation (i in abbv) for each country.

I have tried a for loop up to now, but am not quite sure if this is the correct method or if I am attempting to use the loop in the correct way. Any help would be appreciated. I am stuck on the nested for loop and not sure where to go from there - I know the code does not run below as is, I have left errors to try to explain my thought process to solve this problem. Thanks.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#read in
data_1h = pd.read_csv('/Users/xx/Downloads/opsd-time_series-2019-06-05/time_series_60min_singleindex.csv')

#get country abbreviations
abbv = [(i[:2]) for i in data_1h.columns]
abbv = pd.unique(abbv)

#create dataframe for each country
for i in range(len(abbv)):
    for i in abbv:
        i = pd.DataFrame()
        columns = [col for col in data_1h.columns if i in col]
        i = {columns: data_1h.column}

Upvotes: 0

Views: 101

Answers (1)

mak4515
mak4515

Reputation: 56

My suggestion would be to split the column headers on the first "_" character and then turn this into a multi-level index that has the country abbreviation as the first level. You can then either (1) split each dataframe out individually or (2) just select whichever country you want as you normally would with any other dataframe.

In [3]: df.head()
Out[3]:
                                               AL_load_actual_entsoe_power_statistics  ...  UA_west_load_forecast_entsoe_transparency
utc_timestamp        cet_cest_timestamp                                                ...                              
2004-12-31T23:00:00Z 2005-01-01T00:00:00+0100                                     NaN  ...                                        NaN
2005-01-01T00:00:00Z 2005-01-01T01:00:00+0100                                     NaN  ...                                        NaN
2005-01-01T01:00:00Z 2005-01-01T02:00:00+0100                                     NaN  ...                                        NaN
2005-01-01T02:00:00Z 2005-01-01T03:00:00+0100                                     NaN  ...                                        NaN
2005-01-01T03:00:00Z 2005-01-01T04:00:00+0100                                     NaN  ...                                        NaN

[5 rows x 391 columns]

In [4]: df.columns = df.columns.str.split('_', n=1, expand=True)

In [5]: df
Out[5]:
                                                                               AL  ...                                     UA
                                              load_actual_entsoe_power_statistics  ... west_load_forecast_entsoe_transparency
utc_timestamp        cet_cest_timestamp                                            ...                                  
2004-12-31T23:00:00Z 2005-01-01T00:00:00+0100                                 NaN  ...                                    NaN
2005-01-01T00:00:00Z 2005-01-01T01:00:00+0100                                 NaN  ...                                    NaN
2005-01-01T01:00:00Z 2005-01-01T02:00:00+0100                                 NaN  ...                                    NaN
2005-01-01T02:00:00Z 2005-01-01T03:00:00+0100                                 NaN  ...                                    NaN
2005-01-01T03:00:00Z 2005-01-01T04:00:00+0100                                 NaN  ...                                    NaN
...                                                                           ...  ...                                    ...
2019-04-30T19:00:00Z 2019-04-30T21:00:00+0200                                 NaN  ...                                  487.0
2019-04-30T20:00:00Z 2019-04-30T22:00:00+0200                                 NaN  ...                                  447.0
2019-04-30T21:00:00Z 2019-04-30T23:00:00+0200                                 NaN  ...                                  410.0
2019-04-30T22:00:00Z 2019-05-01T00:00:00+0200                                 NaN  ...                                  400.0
2019-04-30T23:00:00Z 2019-05-01T01:00:00+0200                                 NaN  ...                                    NaN

[125593 rows x 391 columns]


In [7]: df['AL']
Out[7]:
                                               load_actual_entsoe_power_statistics
utc_timestamp        cet_cest_timestamp
2004-12-31T23:00:00Z 2005-01-01T00:00:00+0100                                  NaN
2005-01-01T00:00:00Z 2005-01-01T01:00:00+0100                                  NaN
2005-01-01T01:00:00Z 2005-01-01T02:00:00+0100                                  NaN
2005-01-01T02:00:00Z 2005-01-01T03:00:00+0100                                  NaN
2005-01-01T03:00:00Z 2005-01-01T04:00:00+0100                                  NaN
...                                                                            ...
2019-04-30T19:00:00Z 2019-04-30T21:00:00+0200                                  NaN
2019-04-30T20:00:00Z 2019-04-30T22:00:00+0200                                  NaN
2019-04-30T21:00:00Z 2019-04-30T23:00:00+0200                                  NaN
2019-04-30T22:00:00Z 2019-05-01T00:00:00+0200                                  NaN
2019-04-30T23:00:00Z 2019-05-01T01:00:00+0200                                  NaN

[125593 rows x 1 columns]

Upvotes: 2

Related Questions