Reputation: 419
I am new to python and programming in general.
I am trying to figure out how to return a comma separated value at the corresponding position within a different column in pandas and store this output in a new column. See my example below
key_list = [cat, dog, pig]
A B
---------------------
1 cat
1, 2 dog, cat
1, 2, 3 pig, dog, cat
I want an output which is the following:
A B cat_result dog_result pig_result
----------------------------------------------------------------
1 cat 1 NAN NAN
6, 2 dog, cat 2 6 NAN
8, 3, 1 pig, dog, cat 1 3 8
So, I would like to be able to check for the presence of the keys (a, b or c) in column B, then if it is present, return the value in column A that is in the corresponding comma separated value within that cell.
So far I have this:
for key in key_list:
df["{}_result".format{key}] = df.apply(lambda _: int(key in _.B), axis=1)
Which will create a new column for each key_result and then give a 1 if the key is present within B or a 0 if not. Not sure where to go from here or if this is the right approach. Any help is much appreciated. Thanks!
Upvotes: 1
Views: 56
Reputation: 294218
I use np.core.defchararray.split
in a lambda
to help split the column's values. I could have used pd.Series.str.split
, But I opted for this.
Then I use the lambda
and iterate through row by row to create a list of dictionaries. That list of dictionaries can then be passed to the pd.DataFrame
constructor.
Finally, I use join
to attach the original dataframe.
s = lambda x: np.core.defchararray.split(x.values.astype(str), ', ')
df.join(
pd.DataFrame(
[dict(zip(*t)) for t in zip(s(df.B), s(df.A))]
).add_suffix('_result')
)
A B cat_result dog_result pig_result
0 1 cat 1 NaN NaN
1 6, 2 dog, cat 2 6 NaN
2 8, 3, 1 pig, dog, cat 1 3 8
Upvotes: 1