Reputation: 41
I am trying to build a 2 x 24 table in pandas with the following data below:
d.iloc[0:2] = [[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L]]
Basically, the first sub-bracket represents 24 hour data for one day in January, and the second sub-bracket for Febuary. I am looking to structure the 2x24 table (without 'L') in the following fashion:
1 2 3 4 5 6 7 8 9 10 11 12 ... 24
Jan 0 0 0 0 0 0 0 0 0 1 1 1 ... 0
Feb 0 0 0 0 0 0 0 0 0 1 1 1 ... 0
What I find challenging is stripping (.strip), splitting, and copying the data to a new dataframe structure. I often find the original structure on online dataframes with 12 sub-brackets (one for each month).
I included d.iloc[0,2]
because I am going to apply the function to all elements in column 2 with a for
loop.
Thank you for your precious help.
Upvotes: 0
Views: 285
Reputation: 862521
I think you can use DataFrame.from_records
with apply str.strip
:
import pandas as pd
import numpy as np
a = [['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '0L', '0L', '0L'],
['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '2L', '2L', '0L', '0L', '0L']]
idx = ['Jan','Feb']
df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 \
Jan 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 1 1
Feb 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 2 2
21 22 23
Jan 0 0 0
Feb 0 0 0
[2 rows x 24 columns]
More general solution with generating months names by dt.strftime
:
print (pd.Series(range(1,len(a) + 1)))
0 1
1 2
dtype: int32
idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')
0 Jan
1 Feb
dtype: object
df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 \
Jan 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 1 1
Feb 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 2 2
21 22 23
Jan 0 0 0
Feb 0 0 0
If need split
values first:
b = [['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L'],
['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L']]
idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')
df1 = pd.DataFrame.from_records(b, index=idx)
.iloc[:,0]
.str.split(', ', expand=True)
.replace({'L':''}, regex=True)
.astype(int)
print (df1)
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 \
Jan 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 1 1
Feb 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 1 1 2 2
21 22 23
Jan 0 0 0
Feb 0 0 0
[2 rows x 24 columns]
Upvotes: 1