Reputation:
I am trying to do some file merging with Latitude and Longitude.
Input File1.csv
Name,Lat,Lon,timeseries(n)
London,80.5234,121.0452,523
London,80.5234,121.0452,732
London,80.5234,121.0452,848
Paris,90.4414,130.0252,464
Paris,90.4414,130.0252,829
Paris,90.4414,130.0252,98
New York,110.5324,90.0023,572
New York,110.5324,90.0023,689
New York,110.5324,90.0023,794
File2.csv
Name,lat,lon,timeseries1
London,80.5234,121.0452,500
Paris,90.4414,130.0252,400
New York,110.5324,90.0023,700
Now Expected output is
File2.csv
Name,lat,lon,timeseries1,timeseries(n) #timeseries is 24 hrs format 17:45:00
London,80.5234,121.0452,500,2103 #Addition of all three values
Paris,90.4414,130.0252,400,1391
New York,110.5324,90.0023,700,2055
With python, numpy and dictionaries
it would be straight as key = sum of values
but I want to use Pandas
Please suggest me how to start with or may be a point me to some example. I have not see anything like Dictionary types with Pandas with Latitude
and Longitude
.
Upvotes: 1
Views: 534
Reputation: 394021
Perform a groupby
aggregation on the first df, call sum
and then merge
this with the other df:
In [12]:
gp = df.groupby('Name')['timeseries(n)'].sum().reset_index()
df1.merge(gp, on='Name')
Out[14]:
Name Lat Lon timeseries1 timeseries(n)
0 London 80.5234 121.0452 500 2103
1 Paris 90.4414 130.0252 400 1391
2 New York 110.5324 90.0023 700 2055
the aggregation looks like this:
In [15]:
gp
Out[15]:
Name timeseries(n)
0 London 2103
1 New York 2055
2 Paris 1391
Your csv files can loaded using read_csv
so something like:
df = pd.read_csv('File1.csv')
df1 = pd.read_csv('File2.csv')
Upvotes: 1