Pandas: Join two columns in dataframe (without creating two)

Question

I have the following data:

    Col1 Col2 Col3 Col4
     A    G     K
     B          L   Q
     C    H     M
     D          N   R
     E    I     O   
     F    J     P

This is what I would like:

    Col1 NEW  Col3 
     A    G     K
     B    Q     L   
     C    H     M
     D    R     N   
     E    I     O   
     F    J     P

I've tried using the following solution and it sort of worked the way I needed. What ended up happening was that I would have dataframe A and then dataframe B. B would contain my new column and A would remain unchanged.

II have attempted to use .replace() and .fillna() but I can't seem to get the syntax right. For some reason when I try to do something like:

      csv[Col2] = csv.replace(to_replace=" ", value=csv[Col4]])

I end up having the entire contents of Col4 overwrite Col2 (blank spaces included). This seems to happen with .replace() and .fillna().

I'm not sure if it helps but in these situations Col2 would only be missing data where Col4 had data. Any ideas?

SOLUTION

Thank you to everyone that replied. I realize now that I have been chasing my tail because of a syntactical error. This was another line of code I tried but couldn't make work. I got the "One column overwrites another" issue (see above).

    csv[Col2] = csv.fillna(csv[Col4]])

But after revisiting that page John Galt linked below I felt like I was missing something..Then it hit me like a train.

    csv[Col2] = csv[Col2].fillna(csv[Col4]])

This worked perfectly. And I feel I like paying closer attention may have resolved this a lot sooner. Thank you all for your patients!

UPDATE ONE

I am appending more information about my dataframe in case it ever helps anyone in the future.

    
    Int64Index: 50000 entries, 0 to 49999
    Data columns (total 6 columns):
    Col 1                      50000 non-null object
    Col 2                      4652 non-null object
    Col 3                      50000 non-null object
    Col 4                      45347 non-null object
    Col 5                      50000 non-null object
    Col 6                      50000 non-null object
    dtypes: object(6)
    memory usage: 1.5+ MB
    None

JohnE · Accepted Answer

To avoid creating any new temporary variables or dataframes, you could just replace 'Col2' with new values and then drop 'Col4'. There are a few ways to do that.

Numpy where() is a good general solution here that can handle ''(empty or blank), ' '(space), or nan:

df['Col2'] = np.where( df.Col2 == ' ',   df.Col4, df.Col2 )  # space

df['Col2'] = np.where( df.Col2.isnull(), df.Col4, df.Col2 )  # blank or nan

@EdChum's answer should also work and fillna() may be easiest (as in @JohnGalt's link), but won't work if you have a space character there.

df['Col2'] = df['Col2'].fillna(df.Col4)

Afterwards, just drop 'Col4'

df = df.drop('Col4',axis=1)

Pandas: Join two columns in dataframe (without creating two)

Answers (1)

Related Questions