Reputation: 145
Update: I am having some issue with the answers and guessing it's something to do with the rows are not matching. (updated the today_df so that today and yesterday has diff # of rows).
I need to compare yesterday's dataframe and today's new dataframe to update one column on today's dataframe. for example: yesterday_df:
fruit price qty comment
apple 1.5 10 In Stock
banana 0.7 20 In Stock
Pear 3.0 5 In Stock
Watermelon 1.2 30 Out Stock
today_df (updated the today_df so that today and yesterday has diff # of rows)
fruit price qty comment
apple 1.5 10
Strawberry 1.7 20
Pineapple 0.9 5
Watermelon 1.2 30
Peach 2.1 10
Desired output: today_df
fruit price qty comment
apple 1.5 10 In Stock <-- from yesterday
Strawberry 1.7 20 <-- empty
Pineapple 0.9 5 <-- empty
Watermelon 1.2 30 Out Stock <-- From Yesterday
peach 2.1 10 <-- empty
For Today's Dataframe, the comment from previous day for Apple and Watermelon were carried over and the new info (Strawberry and Pineapple) from data were left blank.
i've tried to use series mapping:
fruit_map = yesterday_df.set_index('fruit')['Comment']
today_df['Comment']=today_df['fruit'].map(fruit_map).fillna(today_df['Comment'])
today_df
but getting this error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
fruit_map = yesterday_df.set_index('fruit')['Comment']
today_df['Comment']=today_df['fruit'].map(fruit_map).fillna(today_df['Comment'])
today_df
today_df
fruit price qty comment
apple 1.5 10 In Stock
Strawberry 1.7 20
Pineapple 0.9 5
Watermelon 1.2 30 Out Stock
Upvotes: 2
Views: 4827
Reputation: 25239
Edit:
Since your yesterday_df
has some duplicates in fruit
, you commands throw the errors you posted. Any solution using index
also errors out. You have 2 ways to solve it.
Method 1: get rid duplicates in yesterday_df
and your solution using map
as you posted would work without any issue.
Method 2: Change your solution a little bit with groupby yesterday_df
on fruit
and pick first
of each group to use for updating today_df
as follows:
fruit_map = y_df.groupby('fruit').first()['comment']
t_df['comment'] = t_df['fruit'].map(fruit_map).fillna(t_df['comment'])
Original: with data as belows:
y_df
Out[2896]:
fruit price qty comment
0 apple 1.5 10 In Stock
1 banana 0.7 20 In Stock
2 Pear 3.0 5 In Stock
3 Watermelon 1.2 30 Out Stock
t_df
Out[2897]:
fruit price qty comment
0 apple 1.5 10
1 Strawberry 1.7 20
2 Pineapple 0.9 5
3 Watermelon 1.2 30
You just need this command
t_df.comment = y_df.comment.where((y_df.price == t_df.price) & (y_df.fruit == t_df.fruit), '')
Out[2908]:
fruit price qty comment
0 apple 1.5 10 In Stock
1 Strawberry 1.7 20
2 Pineapple 0.9 5
3 Watermelon 1.2 30 Out Stock
Upvotes: 2
Reputation: 2137
You were on the right path, but you can leverage indexes to assign values so there's no need for map. When you assign values to a dataframe using a Series, it'll align on the index. The key here is to make sure that you're using "fruit" as the index for both dataframes. The easiest way to do this is:
import pandas as pd
from pandas.compat import StringIO
yesterday = """
fruit,price,qty,comment,col to show it works with different shapes
apple,1.5,10,In Stock,bleh
banana,0.7,20,In Stock,blah
Pear,3.0,5,In Stock,bip
Watermelon,1.2,30,Out Stock,bop
"""
today = """
fruit,price,qty,comment
apple,1.5,10,
Strawberry,1.7,20,
Pineapple,0.9,5,
Watermelon,1.2,30,
Peach,2.1,10,
"""
yesterday_df = pd.read_csv(StringIO(yesterday), sep=",").set_index("fruit")
today_df = pd.read_csv(StringIO(today), sep=",").set_index("fruit")
today_df["comment"] = yesterday_df["comment"]
today_df
Out[1]:
price qty comment
fruit
apple 1.5 10 In Stock
Strawberry 1.7 20 NaN
Pineapple 0.9 5 NaN
Watermelon 1.2 30 Out Stock
Peach 2.1 10 NaN
Alternatively, you could set the index to 'fruit' when you read in your data and avoid the set_index
call.
You might also want to clean up you index ('fruit' column) by lower casing it since string indexes are case sensitive. For example, if yesterday you had "Apple" and today it was "apple", this wouldn't work.
Also, I'm assuming you're not interested in changing anything about 'price' or 'qty'.
Upvotes: 2