Reputation: 956
I have several csv files with approximately the following structure:
name,title,status,1,2,3
name,title,status,4,5,6
name,title,status,7,8,9
Most of the name
columns is the same in all files, only the columns 1,2,3,4... are different.
I need to take turns adding new columns to existing and new rows, as well as updating the remaining rows each time.
For example, I have 2 tables:
name,title,status,1,2,3
Foo,Bla-bla-bla,10,45.6,12.3,45.2
Bar,Too-too,13,13.4,22.6,75.1
name,title,status,4,5,6
Foo,Bla-bla-bla,14,25.3,125.3,5.2
Fobo,Dom-dom,20,53.4,2.9,11.3
And at the output I expect a table:
name,title,status,1,2,3,4,5,6
Foo,Bla-bla-bla,14,45.6,12.3,45.2,25.3,125.3,5.2
Bar,Too-too,13,13.4,22.6,75.1,,,
Fobo,Dom-dom,20,,,,53.4,2.9,11.3
I did not find anything similar, who can tell how I can do this?
Upvotes: 1
Views: 45
Reputation: 294258
It looks like you want to keep just one version of ['name', 'title', 'status']
and from your example, you prefer to keep the last 'status'
encountered.
I'd use pd.concat
and follow that up with a groupby
to filter out duplicate status.
df = pd.concat([
pd.read_csv(fp, index_col=['name', 'title', 'status'])
for fp in ['data1.csv', 'data2.csv']
], axis=1).reset_index('status').groupby(level=['name', 'title']).last()
df
status 1 2 3 4 5 6
name title
Bar Too-too 13 13.4 22.6 75.1 NaN NaN NaN
Fobo Dom-dom 20 NaN NaN NaN 53.4 2.9 11.3
Foo Bla-bla-bla 14 45.6 12.3 45.2 25.3 125.3 5.2
Then df.to_csv()
produces
name,title,status,1,2,3,4,5,6
Bar,Too-too,13,13.4,22.6,75.1,,,
Fobo,Dom-dom,20,,,,53.4,2.9,11.3
Foo,Bla-bla-bla,14,45.6,12.3,45.2,25.3,125.3,5.2
Upvotes: 2
Reputation: 93161
Keep merging them:
df = None
for path in ['data1.csv', 'data2.csv']:
sub_df = pd.read_csv(path)
if df is None:
df = sub_df
else:
df = df.merge(sub_df, on=['name', 'title', 'status'], how='outer')
Upvotes: 0