Juan Luis Chulilla
Juan Luis Chulilla

Reputation: 368

How can one merge or concatenate Pandas series with different lengths and empty value?

I have a number of series with blanks as some values. Something like this

import pandas as pd
serie_1 = pd.Series(['a','','b','c','',''])
serie_2 = pd.Series(['','d','','','e','f','g'])

There is no problem in filtering blanks in each series, something like serie_1 = serie_1[serie_1 != '']

However, when I combine them in one df, either building the df from them or either building two one-column df and concatting them, I'm not obtaining what I'm looking for.

I'm looking for a table like this:

    
  col1 col2

0   a   d
1   b   e
2   c   f
3   nan g

But I am obtaining something like this

0   a   nan
1   nan d
2   b   nan
3   c   nan
4   nan e
5   nan f
6   nan g

How could I obtain the table I'm looking for?

Thanks in advance

Upvotes: 1

Views: 1181

Answers (3)

Alex Luis Arias
Alex Luis Arias

Reputation: 1394

I would just filter out the blank values before creating the dataframe like this:

import pandas as pd

def filter_blanks(string_list):
    return [e for e in string_list if e]

serie_1 = pd.Series(filter_blanks(['a','','b','c','','']))
serie_2 = pd.Series(filter_blanks(['','d','','','e','f','g']))

pd.concat([serie_1, serie_2], axis=1)

Which results in:

    0   1
0   a   d
1   b   e
2   c   f
3   NaN g

Upvotes: 1

jsmart
jsmart

Reputation: 3001

Here is one approach, if I understand correctly:

pd.concat([
    serie_1[lambda x: x != ''].reset_index(drop=True).rename('col1'),
    serie_2[lambda x: x != ''].reset_index(drop=True).rename('col2')
], axis=1)

   col1  col2
0   a    d
1   b    e
2   c    f
3   NaN  g

The logic is: select non-empty entries (with the lambda expression). Re-start index numbering from 0 (with reset index). Set the column names (with rename). Create a wide table (with axis=1 in the merge function).

Upvotes: 2

Chris
Chris

Reputation: 29742

One way using pandas.concat:

ss = [serie_1, serie_2]
df = pd.concat([s[s.ne("")].reset_index(drop=True) for s in ss], 1)
print(df)

Output:

     0  1
0    a  d
1    b  e
2    c  f
3  NaN  g

Upvotes: 2

Related Questions