terencetch
terencetch

Reputation: 35

Merging column values in a data frame in Pandas / Python

I'm trying to merge the values of columns (Columns B and C) within the same dataframe. B and C sometimes have the same values. Some values in B are present in C while some values in C are present in B. The final results would show one column that is the combination of the two columns.

Initial data:

 A          B          C            D
Apple    Canada        ''          RED
Bananas    ''          Germany     BLUE
Carrot     US          US          GREEN
Dorito     ''          ''          INDIGO

Expected Data:

 A          B         C
Apple    Canada      RED
Bananas  Germany      BLUE
Carrot     US        GREEN
Dorito     ''        INDIGO

Upvotes: 0

Views: 49

Answers (3)

Mykola Zotko
Mykola Zotko

Reputation: 17834

You can sort strings and take the last one:

df['B'] = df[['B', 'C']].apply(lambda x: x.sort_values()[1], axis=1)

df=df.drop('C', 1).rename(columns={'D':'C'})    
print(df)

Output:

         A        B       C
0    Apple   Canada     RED
1  Bananas  Germany    BLUE
2   Carrot       US   GREEN
3   Dorito       ''  INDIGO

Upvotes: 1

Erfan
Erfan

Reputation: 42916

Another way would be to make smart use of list comprehension:

# Make sets of the column B and C combined to get rid of duplicates
k = [set(b.strip() for b in a) for a in zip(df['B'], df['C'])]

# Flatten sets to strings
k = [''.join(x) for x in k]

# Create desired column
df['B'] = k
df.drop('C', axis=1, inplace=True)

print(df)
         A        B       D
0    Apple   Canada     RED
1  Bananas  Germany    BLUE
2   Carrot       US   GREEN
3   Dorito           INDIGO

Upvotes: 0

BENY
BENY

Reputation: 323276

IIUC

df['B']=df[['B','C']].replace("''",np.nan).bfill(1).loc[:,'B']
df=df.drop('C',1).rename(columns={'D':'C'})
df
Out[102]: 
         A        B       C
0    Apple   Canada     RED
1  Bananas  Germany    BLUE
2   Carrot       US   GREEN
3   Dorito      NaN  INDIGO

Upvotes: 2

Related Questions