Jitesh
Jitesh

Reputation: 195

merge Two data frames of different size in python pandas

I have dataframe like this: df1:

Id   name Checksum
2001  A   e882
2002  B   2884
2002  C   ee12,ee84
2003  D   ee23
2004  E   ee42,ee43
2006  F   2884,2993,3884,3855
2006  G   344,122,288

I want output like this:

Id   name Checksum
2001  A   e882
2002  B   2884
2002  C   ee12
2002  C   ee84
2003  D   ee23
2004  E   ee42
2004  E   ee43
2006  F   2884
2006  F   2993
2006  F   3884
2006  F   3855
2006  G   344
2006  G   122
2006  G   288

I want to create new data frame like the above

How do I do it in python pandas ?

Upvotes: 1

Views: 582

Answers (1)

jezrael
jezrael

Reputation: 862511

You can use str.split for lists, then get len for length.

Last create new DataFrame by constructor with numpy.repeat and numpy.concatenate:

s = df['Checksum'].str.split(',')
print (s)
0                      [e882]
1                      [2884]
2                [ee12, ee84]
3                      [ee23]
4                [ee42, ee43]
5    [2884, 2993, 3884, 3855]
6             [344, 122, 288]
Name: Checksum, dtype: object

l = s.str.len()
print (l)
0    1
1    1
2    2
3    1
4    2
5    4
6    3
Name: Checksum, dtype: int64

cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = np.concatenate(s)
df = df.reindex_axis(df.columns, axis=1)
print (df)

      Id name Checksum
0   2001    A     e882
1   2002    B     2884
2   2002    C     ee12
3   2002    C     ee84
4   2003    D     ee23
5   2004    E     ee42
6   2004    E     ee43
7   2006    F     2884
8   2006    F     2993
9   2006    F     3884
10  2006    F     3855
11  2006    G      344
12  2006    G      122
13  2006    G      288

Altrnative with chain.from_iterable:

from  itertools import chain

s = df['Checksum'].str.split(',')
l = s.str.len()
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = list(chain.from_iterable(s))
df = df.reindex_axis(df.columns, axis=1)
print (df)
         Id name Checksum
0   2001    A     e882
1   2002    B     2884
2   2002    C     ee12
3   2002    C     ee84
4   2003    D     ee23
5   2004    E     ee42
6   2004    E     ee43
7   2006    F     2884
8   2006    F     2993
9   2006    F     3884
10  2006    F     3855
11  2006    G      344
12  2006    G      122
13  2006    G      288

Upvotes: 2

Related Questions