Reputation: 195
I have dataframe like this: df1:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12,ee84
2003 D ee23
2004 E ee42,ee43
2006 F 2884,2993,3884,3855
2006 G 344,122,288
I want output like this:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12
2002 C ee84
2003 D ee23
2004 E ee42
2004 E ee43
2006 F 2884
2006 F 2993
2006 F 3884
2006 F 3855
2006 G 344
2006 G 122
2006 G 288
I want to create new data frame like the above
How do I do it in python pandas ?
Upvotes: 1
Views: 582
Reputation: 862511
You can use str.split
for list
s, then get len
for length
.
Last create new DataFrame
by constructor
with numpy.repeat
and numpy.concatenate
:
s = df['Checksum'].str.split(',')
print (s)
0 [e882]
1 [2884]
2 [ee12, ee84]
3 [ee23]
4 [ee42, ee43]
5 [2884, 2993, 3884, 3855]
6 [344, 122, 288]
Name: Checksum, dtype: object
l = s.str.len()
print (l)
0 1
1 1
2 2
3 1
4 2
5 4
6 3
Name: Checksum, dtype: int64
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = np.concatenate(s)
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
Altrnative with chain.from_iterable
:
from itertools import chain
s = df['Checksum'].str.split(',')
l = s.str.len()
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = list(chain.from_iterable(s))
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
Upvotes: 2