Reputation: 159
As I am completely new to pandas, I would like to ask.
I have two CSV files.
One of them has all the colors of different languages in one column:
I have color blue in three rows here: azul, bleu all means blue in different languages, so they should be in the same group 1. rouge and rojo means red, so they are also in the same group, so they should have 2.
Here is my first table colors:
name | group |
---|---|
blue | 1 |
azul | 1 |
bleu | 1 |
rouge | |
red | |
rojo | |
verde | |
vert | |
green |
and so on, in my csv this column group is empty and I have to fill it
I have also second CSV file, when it looks like this
Table colors_language:
english | french | spanish | group |
---|---|---|---|
green | vert | verde | 3 |
red | rouge | rojo | 2 |
blue | bleu | azul | 1 |
I want to fill the column group in the first CSV file by comparing it to the second CSV file which has information.
I have found out how to compare on one column, but how I compare on column from one csv to three columns from another csv to fill the group column in the first CSV?
df1 = pd.read_csv('colors.csv')
df2 = pd.read_csv('colors_language.csv')
mergedStuff = pd.merge(df1, df2, on=['name'], how='inner')
#I have tried also this, but it gives me information of Empty DataFrame
my_list = df1['name'].unique().tolist()
lang = df2[df2['english'].isin(my_list)]
print(lang)
Here is my code, but it doesn't work. I think this is because column name is only in one csv, but how I can join name from the first csv to English, French, Spanish columns in second csv?
Upvotes: 1
Views: 873
Reputation: 1624
You can first use pd.melt
to transfer all columns in df2 in the rows, and then use pd.merge
to merge it with df1 on the color names column:
df2 = pd.melt(df2, id_vars=['group'], value_vars=['english', 'french', 'spanish'], value_name='color')
merged_df = pd.merge(df1, df2, left_on='name', right_on='color')[['name', 'group']]
Result:
name group
0 blue 1
1 azul 1
2 bleu 1
3 rouge 2
4 red 2
5 rojo 2
6 verde 3
7 vert 3
8 green 3
Upvotes: 1
Reputation: 13349
You can use map
Using df2
you can create a dict d
and then map the name values to their corresponding group.
d = dict(zip(df2.T.values[3], df2.values[:,:3].tolist()))
df1['group'] = df1.name.map(lambda x: [k for k in d if x in d[k]][0])
d:
{3: ['green', 'vert', 'verde'],
2: ['red', 'rouge', 'rojo'],
1: ['blue', 'bleu', 'azul']}
df1:
name group
0 blue 1
1 azul 1
2 bleu 1
3 rouge 2
4 red 2
5 rojo 2
6 verde 3
7 vert 3
8 green 3
Upvotes: 1