Mr. Confused
Mr. Confused

Reputation: 255

CSV value mapping from 2 files like map in pandas

I have two csv files that i created with python from an unstructured data but i don't want my script to output two files once i run the script on a json. So lets say i have a file A with columns as follows:

File 1:

feats   ID     A         B     C       E
AA      123   3343      234   2342    112
BB      121   3342      237   2642    213``
CC      122   3341      232   2352    912
DD      123   3343      233   5342    12
EE      121   3345      235   2442    2112

...and so on with lets say, 10000 rows of different values and 6 columns. Now I want to check these values of column "ID" against file 2 and merge on the values of ID.

File 2:

Char_Name           ID    Cosmic Awareness            
  Uatu              123          3.4              
  Galan             121          4.5              ``
  Norrin Radd       122          1.6              
  Shalla-bal        124          0.3              
  Nova              125          1.2      

This file 2 has only 5 rows for 5 different values for b and lets say 23 column values. I can do this easily with map or apply in pandas but i'm dealing with 1000's of files and don't wanna do that. Is their any way like mapping the file 2 values (name and cosmic awareness columns) to File 1 by adding new columns titled 'name' and 'cosmic' (from file 2) by matching the values with corresponding ID values on File 1 and File 2. The expected output should be somewhat like this.

Final File:

feats   ID     A         B     C       E      Char_Name    Cosmic Awareness
AA      123   3343      234   2342    112     Uatu           3.4
BB      121   3342      237   2642    213``   Galan          4.5
CC      122   3341      232   2352    912     Norrin Radd    1.6
DD      123   3343      233   5342    12      Uatu           3.4
EE      121   3345      235   2442    2112    Galan          4.5

Thanks in advance and if their is any way to improve this question, the suggestions are welcome. I will incorporate them here. I have added the expected outcome above.

Upvotes: 1

Views: 1394

Answers (1)

jezrael
jezrael

Reputation: 863166

I think need glob for all file names and then in list comprehension create DataFrame:

from functools import reduce
import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]

Last merge together:

df = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

For outer join is possible use concat:

import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp, index_col=['ID']) for fp in files]

df = pd.concat(dfs, axis=1)

Upvotes: 1

Related Questions