John12
John12

Reputation: 41

Extract data from multiple bracket string in Pandas and create new table

I am trying to build a 2 x 24 table in pandas with the following data below:

d.iloc[0:2] = [[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L]]

Basically, the first sub-bracket represents 24 hour data for one day in January, and the second sub-bracket for Febuary. I am looking to structure the 2x24 table (without 'L') in the following fashion:

    1 2 3 4 5 6 7 8 9 10 11 12 ... 24
Jan 0 0 0 0 0 0 0 0 0 1  1  1  ... 0
Feb 0 0 0 0 0 0 0 0 0 1  1  1  ... 0

What I find challenging is stripping (.strip), splitting, and copying the data to a new dataframe structure. I often find the original structure on online dataframes with 12 sub-brackets (one for each month). I included d.iloc[0,2] because I am going to apply the function to all elements in column 2 with a for loop. Thank you for your precious help.

Upvotes: 0

Views: 285

Answers (1)

jezrael
jezrael

Reputation: 862521

I think you can use DataFrame.from_records with apply str.strip:

import pandas as pd
import numpy as np

a = [['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '0L', '0L', '0L'], 
     ['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '2L', '2L', '0L', '0L', '0L']]

idx = ['Jan','Feb']
df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0  

[2 rows x 24 columns]

More general solution with generating months names by dt.strftime:

print (pd.Series(range(1,len(a) + 1)))
0    1
1    2
dtype: int32

idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')
0    Jan
1    Feb
dtype: object

df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0 

If need split values first:

b = [['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L'], 
     ['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L']]

idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')

df1 = pd.DataFrame.from_records(b, index=idx)
        .iloc[:,0]
        .str.split(', ', expand=True)
        .replace({'L':''}, regex=True)
        .astype(int)
print (df1)

     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0  

[2 rows x 24 columns]

Upvotes: 1

Related Questions