Paulina
Paulina

Reputation: 159

How to fill one column in a csv by comparing values to three different columns in another csv file?

As I am completely new to pandas, I would like to ask.

I have two CSV files.

One of them has all the colors of different languages in one column:

I have color blue in three rows here: azul, bleu all means blue in different languages, so they should be in the same group 1. rouge and rojo means red, so they are also in the same group, so they should have 2.

Here is my first table colors:

name group
blue 1
azul 1
bleu 1
rouge
red
rojo
verde
vert
green

and so on, in my csv this column group is empty and I have to fill it

I have also second CSV file, when it looks like this

Table colors_language:

english french spanish group
green vert verde 3
red rouge rojo 2
blue bleu azul 1

I want to fill the column group in the first CSV file by comparing it to the second CSV file which has information.

I have found out how to compare on one column, but how I compare on column from one csv to three columns from another csv to fill the group column in the first CSV?

df1 = pd.read_csv('colors.csv')
df2 = pd.read_csv('colors_language.csv')

mergedStuff = pd.merge(df1, df2, on=['name'], how='inner')

#I have tried also this, but it gives me information of Empty DataFrame
my_list = df1['name'].unique().tolist()
lang = df2[df2['english'].isin(my_list)]
print(lang)

Here is my code, but it doesn't work. I think this is because column name is only in one csv, but how I can join name from the first csv to English, French, Spanish columns in second csv?

Upvotes: 1

Views: 873

Answers (2)

ashkangh
ashkangh

Reputation: 1624

You can first use pd.melt to transfer all columns in df2 in the rows, and then use pd.merge to merge it with df1 on the color names column:

df2 = pd.melt(df2, id_vars=['group'], value_vars=['english', 'french', 'spanish'], value_name='color')
merged_df = pd.merge(df1, df2, left_on='name', right_on='color')[['name', 'group']]

Result:

    name    group
0   blue    1
1   azul    1
2   bleu    1
3   rouge   2
4   red     2
5   rojo    2
6   verde   3
7   vert    3
8   green   3

Upvotes: 1

Pygirl
Pygirl

Reputation: 13349

You can use map

Using df2 you can create a dict d and then map the name values to their corresponding group.

d = dict(zip(df2.T.values[3], df2.values[:,:3].tolist()))
df1['group'] = df1.name.map(lambda x: [k for k in d if x in d[k]][0])

d:

{3: ['green', 'vert', 'verde'],
 2: ['red', 'rouge', 'rojo'],
 1: ['blue', 'bleu', 'azul']}

df1:

    name    group
0   blue    1
1   azul    1
2   bleu    1
3   rouge   2
4   red     2
5   rojo    2
6   verde   3
7   vert    3
8   green   3

Upvotes: 1

Related Questions