Reputation: 41
I may be overcomplicating this problem, however I can't seem to find a simple solution.
I have two DataFrame's. Let's call them df1 and df2. To keep things simple. Let's say df1 has one column called "Some Data" and df2 has two columns called "some data" and "other data".
Example:
df1
Some Data
"Lebron James 123"
"Lebron James 234"
df2
some data other data
"Lebron James 123 + other text" "I want this in df1["New?"]"
"Michael Jordan" "Doesn't Matter"
So basically I want to create a new column in df1 called "New?". This new column (in df1) will say "New" if df1["Some data"] is in df2["Some other data"]. However, if there is no instance in df2["some data"], then I set the df1["New?"] to that specific row's value in df2["other data"].
Desired result after running:
df1
Some Data New?
"Lebron James 123" "I want this in df1["New?"]"
"Lebron James 234" "New"
So as you can see The New? column would include that specific row's value from the other data column. Lebron James 234 isn't anywhere in some data in df2 so it says new.
I am able to get it to say True or False using the .isin()
method, however don't know how to grab the index of the other df and get the value from the other data column.
Thank you
EDIT:
From what I know will work
df["New?"] = df1["Some Data"].isin(df2["some data"])
Would render
df1["New?"]
True
False
So I want True to be the "I want this in df1["New?"]" and False to be New
Upvotes: 1
Views: 61
Reputation: 59274
Based on your info, seems like you need only a simple np.where
(if dfs
have same length)
df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')
Some Data New?
0 Lebron James 123 + other text I want this in df1[New?"]"
1 Lebron James 234 New
For different length,
mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']
df.fillna('New')
Explanation
Basically you have a mask, and you use the same mask to filter both data frames. This yields the same number of results on both dfs
given the descriptions, and you assign the filtered rows' "other data" values from df2
to the same matching rows in df
"some data"
Upvotes: 1
Reputation: 51165
First create a regular expression by joining your df1
series:
rgx = '|'.join(df1['some data'])
Now using np.where
:
df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))
some data data
0 Lebron James 123 I want this in df1["New?"]
1 Lebron James 234 New
An example with mismatching shapes:
df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})
rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))
df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()
index a
0 a aaaaa
1 b bbbb
2 c New
3 d New
Upvotes: 1