Modify a Data Frame column with list comprehension

I have a list with about 90k strings, and a Data Frame with several columns, I'm interested in checking whether a string of the list is in column_1 and if it is assign the same value at column_2.

I can do this:

for i in range(len(my_list)):
    item = list[i]
    for j in range(len(df)):
         if item == df['column_1'][j]:
             df['column_2'][j] = item

But I would prefer to avoid the nested loops

I tried this

for item in my list:
    if item in list(df['column _1']):
          position = df[df['column_1']==item]].index.values[0]
          df['column_2'][position]  = item

but I think that this solution is even slower and harder to read, can this operation be done with a simple list comprehension?

Edit.

Second solution it's considerable faster, about an order of magnitude. why is that? seems that in that case it has to search twice for the mach:

here:

if item in list(df['column _1'])

and here:

possition = df[df['column_1]=='tem]].index.values[0]

Still I would prefer a simpler solution.

Upvotes: 5

Views: 12091

Answers (4)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

Try this one-liner:

df.loc[(df['column_1'].isin(my_list)), 'column_2'] = df['column_1']

The difference to @res_edit's solution is the lack of the second df.loc[] which should work bit faster...

Upvotes: 1

Kevin
Kevin

Reputation: 8207

Let's say you have a list:

l = ['foo', 'bar']

and a DataFrame:

df = pd.DataFrame(['some', 'short', 'string', 'has', 'foo'], columns=['col1'])

You can use df.apply

df['col2'] = df.apply(lambda x: x['col1'] if x['col1'] in l else None, axis=1)

df
    col1    col2
0   some    None
1   short   None
2   string  None
3   has     None
4   foo     foo

Upvotes: 3

res_edit
res_edit

Reputation: 183

You can do this by splitting the filtering and assignment actions you described into two distinct steps.

Pandas series objects include an 'isin' method that could let you identify rows whose column_1 values are in my_list and saves the results off in a boolean-valued series. This can in turn be used with the .loc indexing method to copy the values from the appropriate rows from column 1 to column 2

# Identify the matching rows
matches = df['column_1'].isin(my_list)
# Set the column_2 entries to column_1 in the matching rows
df.loc[matches,'column_2'] = df.loc[matches,'column_1']

If column_2 doesn't already exist, this approach creates column_2 and sets the non_matching values to NaN. The .loc method is used to avoid operating on a copy of the data when performing the indexing operations.

Upvotes: 5

Stop harming Monica
Stop harming Monica

Reputation: 12610

According to conventional wisdom you shouldn't be using a list comprehension for side effects. You'd be creating a (maybe huge) list that you don't need, wasting resources and hurting readability.

https://codereview.stackexchange.com/questions/58050/is-it-pythonic-to-create-side-effect-inside-list-comprehension Is it Pythonic to use list comprehensions for just side effects? Python loops vs comprehension lists vs map for side effects (i.e. not using return values)

Upvotes: 1

Related Questions