Kristina
Kristina

Reputation: 25

Combine several csv with same columns naming

I've got 3 csv files (june.csv,july.csv, august.csv)

July.csv is like:

    company_name  |  number_of_employers | money_spend  
    comp1         | 20                   | 1200
    comp2         | 30                   | 1000 

August.csv is like:

     company_name |  number_of_employers | money_spend  
     comp1        | 25                   | 1800
     comp3        | 50                   | 2000 

I wanna combine these csvs to one with the structure like that:

              |  august                            |july
company_name  |  number_of_employers | money_spend |  number_of_employers | money_spend
comp1         | 25                   | 1800        |20                    |1200
comp2         | 0                    | 0           |30                    |1000
comp3         | 50                   | 2000        |0                     |0

How can I do this?

Upvotes: 0

Views: 27

Answers (2)

gal peled
gal peled

Reputation: 482

for the marge, you can do as following

July_df= pd.read_csv("july.csv")
August_df = pd.read_csv("august.csv")
merge(July_df, August_df, on='company_name', suffixes=['_july', '_august'])

not sure how to create the multi-index automatically

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try pd.read_csv with .set_index(). Then concatenate the dataframe using pd.concat:

# adjust `sep=` parameter accordingly:
df_1 = pd.read_csv("july.csv", sep=r",", engine="python").set_index(
    "company_name"
)
df_2 = pd.read_csv("august.csv", sep=r",", engine="python").set_index(
    "company_name"
)
df_out = pd.concat({"july": df_1, "august": df_2}, axis=1).fillna(0)
print(df_out)

Prints:

                            july                          august            
             number_of_employers money_spend number_of_employers money_spend
company_name                                                                
comp1                       20.0      1200.0                25.0      1800.0
comp2                       30.0      1000.0                 0.0         0.0
comp3                        0.0         0.0                50.0      2000.0

Upvotes: 1

Related Questions