Reputation: 309
I am extracting data from many CSV files and saving it into one file. This process is repetitive and every so often, I have new versions of the files and so the underlying data needs to be updated. I am having a hard time getting the data that is desired. Here is an example of what I am trying to do:
old file (/old/Test1.csv)
tests col1 col2 col3
test1 11 8 5
test3 9 5 7
test5 12 9 -1
new file (/new/Test1.csv)
tests col2 col3 col4
test1 8 7 15
test3 5 9 10
test7 1 4 9
test9 11 10 12
Note that in the new file, col1 is absent, there is a new column col4, row test5 is absent and there are two new rows test7 and test9. The desired output should:
For the data shown above, this should be the updated file (Test1_update.csv)
tests col1 col2 col3 col4
test1 11 8 7 15
test3 9 5 9 10
test5 12 9 -1 0
test7 0 1 4 9
test9 0 11 10 12
I am able to get to this point using the code below:
tests col1 col2 col3 col4
test1 Nan 8 7 15
test3 Nan 5 9 10
test5 Nan Nan Nan Nan
test7 Nan 1 4 9
test9 Nan 11 10 12
import pandas as pd
import numpy as np
df1 = pd.read_csv('\\dir\\test1.csv', index_col=0)
df2 = pd.read_csv('\\dir\\test2.csv', index_col=0)
new_index = list(set(list(df1.index.values)+list(df2.index.values)))
new_cols = list(set(list(df1.columns.values)+list(df2.columns.values)))
df3 = pd.DataFrame(index=new_index, columns=new_cols)
df4 = df2.reindex(df3.index)
df4 = df4.join(df3, rsuffix='_P')
df4 = df4.loc[:,~df4.columns.str.endswith('_P')]
print df4
Upvotes: 3
Views: 7508
Reputation: 309
I was able to get the desired data frame.
import pandas as pd
import numpy as np
df1 = pd.read_csv('\\dir\\test1.csv', index_col=0)
df2 = pd.read_csv('\\dir\\test2.csv', index_col=0)
new_index = list(set(list(df1.index.values)+list(df2.index.values)))
df2 = df2.reindex(new_index)
df2 = df2.join(df1, rsuffix='_P')
df2 = df2.loc[:,~df2.columns.str.endswith('_P')].fillna(df1).fillna(0)
df2.sort_index(inplace=True)
print df2.to_string()
col2 col3 col4 col1
test1 8 7 15 11
test3 5 9 10 9
test5 9 -1 0 12
test7 1 4 9 0
test9 11 10 12 0
Upvotes: 2