Reputation: 384
I am creating a function that compares a dataframe (DF) to a series (S) and eventually returns a new dataframe. The common column is 'name'. I want the function to return a dataframe with the same number of rows as the series (S) and the same number of columns as the df. The function will search name columns in the df and find all of the matching names in the series (S). If a match is found I want a new row of a new dataframe to be created that matches the df row for that specific name. If a match is not found I want a new row to be created for the result dataframe regardless but to include all 0.0 for the cells for that particular row. I've been trying to figure this out for the past 6 hours. I'm having issues with broadcasting I believe. Here is what I have tried.
Here is some sample data
Series:
S[500:505]
500 Nanotechnology
501 Music
502 Logistics & Supply Chain
503 Computer & Network Security
504 Computer Software
Name: name, dtype: object
DataFrame: NOTE: there is a column called name which is also industries. So row =0 here is Defense & Space in the name column.
Defense & Space Computer Software Internet Semiconductors \
0 1.0 0.0 0.0 0.0
1 0.0 1.0 0.5 0.5
2 0.0 0.5 1.0 0.5
3 0.0 0.5 0.5 1.0
4 0.5 0.0 0.0 0.0
S.shape = (31454,)
df.shape = (100,101)
Generate an empty dataframe with all zeros
all_zeros = np.zeros((len(S),len(df.columns)))
Put the numpy array into a dataframe
result = pd.DataFrame(data = all_zeros, columns=df.columns, index = range(len(s)))
I don't want the name column to be in the final result
result = result.drop('name', axis=1)
Build a function to be used in a lambda to set the new values for the result dataframe
def set_cell_values(row):
return df.iloc[1,:]
Here is the part where I set the new values for the new dataframe
for index in range(len(df)):
names_are_equal = df['name'][index] == result['name']
map(lambda x: set_cell_values(row), result[names_are_equal]))
To me this makes sense but it seems not to be working. Is there an easy way to make this work that I am unaware of? The map is there because I needed to broadcast the df row into the new dataframe at several rows (not just once).
Upvotes: 0
Views: 218
Reputation: 5356
Don,
So, let's go:
# with this tables
In [66]: S
Out[66]:
0 aaa
1 bbb
2 ccc
3 ddd
4 eee
Name: name, dtype: object
In [84]: df
Out[84]:
a b c name
0 39 71 55 aaa
1 9 57 6 bbb
2 72 22 52 iii
3 68 97 81 jjj
4 30 64 78 kkk
# transform the series to a dataframe
Sd = pd.DataFrame(S)
# merge them with outer join (will keep both tables columns and values).
# fill the NAs with 0
In [86]: pd.merge(Sd,df, how='outer').fillna(0)
Out[86]:
name a b c
0 aaa 39 71 55
1 bbb 9 57 6
2 ccc 0 0 0
3 ddd 0 0 0
4 eee 0 0 0
5 iii 72 22 52
6 jjj 68 97 81
7 kkk 30 64 78
is that it?
Upvotes: 1