Reputation: 1614
I have two similar tables ("hist.csv") :
Historical :
id | url | url2 | url3 | Time
1 A B C 5
2 D E F 8
And ("new.csv") :
New :
id | url | url2 | url3 | Time
1 A Z K 9
2 G H I 11
I want to update the New.Time column with the Historical.Time value, if "url" columns match. i.e. desired output here with url "A" updated :
New2 :
id | url | url2 | url3 | Time
1 A Z K 5
2 G H I 11
I tried the following :
Historical = pd.DataFrame.from_csv("hist.csv", index_col='id', sep='\t', encoding='utf-8')
New = pd.DataFrame.from_csv("new.csv", index_col='id', sep='\t', encoding='utf-8')
for index, row in New.iterrows():
New.loc[index,'Time']=Historical.loc[historical['url'] == row['url'],'Time']
New.to_csv("new2.csv", sep='\t', encoding='utf-8')
Raising :
ValueError: Must have equal len keys and value when setting with an iterable
PS : I found this thread : Updating a DataFrame based on another DataFrame But it looks like the proposed solution with "merge" does not really fit my needs, as I have many columns ?
Upvotes: 2
Views: 3206
Reputation: 90999
The basic issue is that Historical.loc[Historical['url'] == row['url'],'Time']
returns a Series (even if there is only one row or no row where the condition - Historical['url'] == row['url']
-matches). Example -
In [15]: df
Out[15]:
A B
0 1 2
1 2 3
In [16]: df.loc[df['A']==1,'B']
Out[16]:
0 2
Name: B, dtype: int64
And then you try to set this DataFrame into a single cell of your New
dataframe, which is what is causing the issue.
Since in the comments you say -
I may have several rows with "url" in Historical, but they will have the same Time value. In that case, I should consider the first occurence/match.
A quick fix for your code would be to check whether row['url']
exists in the other DataFrame, and only if true, to get the value from it using -
for index, row in New.iterrows():
if row['url'] in Historical['url'].values:
row['Time']=Historical.loc[Historical['url'] == row['url'],'Time'].values[0]
Upvotes: 1