Reputation: 691
Given the following dataframe:
FrameLen FrameCapLen IPHdrLen ... TLSRecordLen TLSAppData PacketTime
0 [117, 66] [117, 66] [20, 20] ... [46.0, nan] [nan, nan] 0.000045
1 [117, 66] [117, 66] [20, 20] ... [46.0, nan] [nan, nan] 0.000024
2 [117, 66] [117, 66] [20, 20] ... [46.0, nan] [nan, nan] 0.000027
3 [117, 66] [117, 66] [20, 20] ... [46.0, nan] [nan, nan] 0.000024
4 [117, 66] [117, 66] [20, 20] ... [46.0, nan] [nan, nan] 0.000011
How can I expand each column that contains a list into respective new columns, so that FrameLen
would become FrameLen_1, FrameLen_2, ...
, etc.?
I know this can be done manually for each column manually (see below) but how could this be done automatically?
Example of manual operation (which looks inefficient?):
tags = df['FrameLen'].apply(pd.Series)
tags = column.rename(columns = lambda x : 'FrameLen_' + str(x))
pd.concat([df[:], tags[:]], axis=1)
del df['FrameLen']
Upvotes: 0
Views: 85
Reputation: 35155
It would be redundant to iterate through the columns, but a simple approach would be to join the split columns and update them with new ones
import pandas as pd
import numpy as np
import io
data = '''
,FrameLen,FrameCapLen,IPHdrLen,TLSRecordLen,TLSAppData,PacketTime
0,"[117, 66]","[117, 66]","[20, 20]","[46.0, nan]","[nan, nan]",0.000045
1,"[117, 66]","[117, 66]","[20, 20]","[46.0, nan]","[nan, nan]",0.000024
2,"[117, 66]","[117, 66]","[20, 20]","[46.0, nan]","[nan, nan]",0.000027
3,"[117, 66]","[117, 66]","[20, 20]","[46.0, nan]","[nan, nan]",0.000024
4,"[117, 66]","[117, 66]","[20, 20]","[46.0, nan]","[nan, nan]",0.000011
'''
df = pd.read_csv(io.StringIO(data), sep=',', index_col=0)
# new columns create
c1 = df['FrameLen'].str.replace(r'^\[|\]$','').str.split(',', expand=True)
c2 = df['FrameCapLen'].str.replace(r'^\[|\]$','').str.split(',', expand=True)
c3 = df['IPHdrLen'].str.replace(r'^\[|\]$','').str.split(',', expand=True)
c4 = df['TLSRecordLen'].str.replace(r'^\[|\]$','').str.split(',', expand=True)
c5 = df['TLSAppData'].str.replace(r'^\[|\]$','').str.split(',', expand=True)
# new columns name create
new_col = [['{}_0'.format(col),'{}_1'.format(col)] for col in df.columns[:5]]
new_col = sum(new_col,[])
new_col.append('PacketTime')
# concat()
df_all = pd.concat([c1,c2,c3,c4,c5,df[['PacketTime']]], axis=1)
df_all.columns = new_col
df_all
| | FrameLen_0 | FrameLen_1 | FrameCapLen_0 | FrameCapLen_1 | IPHdrLen_0 | IPHdrLen_1 | TLSRecordLen_0 | TLSRecordLen_1 | TLSAppData_0 | TLSAppData_1 | PacketTime |
|---:|-------------:|-------------:|----------------:|----------------:|-------------:|-------------:|-----------------:|:-----------------|---------------:|:---------------|-------------:|
| 0 | 117 | 66 | 117 | 66 | 20 | 20 | 46 | nan | nan | nan | 4.5e-05 |
| 1 | 117 | 66 | 117 | 66 | 20 | 20 | 46 | nan | nan | nan | 2.4e-05 |
| 2 | 117 | 66 | 117 | 66 | 20 | 20 | 46 | nan | nan | nan | 2.7e-05 |
| 3 | 117 | 66 | 117 | 66 | 20 | 20 | 46 | nan | nan | nan | 2.4e-05 |
| 4 | 117 | 66 | 117 | 66 | 20 | 20 | 46 | nan | nan | nan | 1.1e-05 |
Upvotes: 0
Reputation: 34056
You can use a for
loop with df.add_prefix
to append column name:
Take below df for example:
In [93]: In [55]: df = pd.DataFrame({'FrameLen':[[117, 66], [117, 66], [117, 66]], 'FrameCapLen':[[117, 66],[117, 66],[117, 66]], 'IPHdrLen':[[20, 20],[20, 20],[20, 20]], 'PacketTime':[0.000045, 0.000024,
...: 0.000027]})
In [94]: df
Out[94]:
FrameLen FrameCapLen IPHdrLen PacketTime
0 [117, 66] [117, 66] [20, 20] 0.000045
1 [117, 66] [117, 66] [20, 20] 0.000024
2 [117, 66] [117, 66] [20, 20] 0.000027
In [90]: d = pd.DataFrame() ## empty dataframe
In [91]: for i,j in df.items():
...: if df['{}'.format(i)].dtype == 'object':
...: temp = pd.DataFrame(df[i].values.tolist()).add_prefix('{}_'.format(i))
...: d = pd.concat([d,temp], axis=1)
...: else:
...: d = pd.concat([d, df[i]], axis=1)
...:
In [92]: d
Out[92]:
FrameLen_0 FrameLen_1 FrameCapLen_0 FrameCapLen_1 IPHdrLen_0 IPHdrLen_1 PacketTime
0 117 66 117 66 20 20 0.000045
1 117 66 117 66 20 20 0.000024
2 117 66 117 66 20 20 0.000027
Upvotes: 1