Reputation: 10373
I have a list with about 90k strings, and a Data Frame with several columns, I'm interested in checking whether a string of the list is in column_1 and if it is assign the same value at column_2.
I can do this:
for i in range(len(my_list)):
item = list[i]
for j in range(len(df)):
if item == df['column_1'][j]:
df['column_2'][j] = item
But I would prefer to avoid the nested loops
I tried this
for item in my list:
if item in list(df['column _1']):
position = df[df['column_1']==item]].index.values[0]
df['column_2'][position] = item
but I think that this solution is even slower and harder to read, can this operation be done with a simple list comprehension?
Edit.
Second solution it's considerable faster, about an order of magnitude. why is that? seems that in that case it has to search twice for the mach:
here:
if item in list(df['column _1'])
and here:
possition = df[df['column_1]=='tem]].index.values[0]
Still I would prefer a simpler solution.
Upvotes: 5
Views: 12091
Reputation: 210842
Try this one-liner:
df.loc[(df['column_1'].isin(my_list)), 'column_2'] = df['column_1']
The difference to @res_edit's solution is the lack of the second df.loc[]
which should work bit faster...
Upvotes: 1
Reputation: 8207
Let's say you have a list:
l = ['foo', 'bar']
and a DataFrame:
df = pd.DataFrame(['some', 'short', 'string', 'has', 'foo'], columns=['col1'])
You can use df.apply
df['col2'] = df.apply(lambda x: x['col1'] if x['col1'] in l else None, axis=1)
df
col1 col2
0 some None
1 short None
2 string None
3 has None
4 foo foo
Upvotes: 3
Reputation: 183
You can do this by splitting the filtering and assignment actions you described into two distinct steps.
Pandas series objects include an 'isin' method that could let you identify rows whose column_1 values are in my_list and saves the results off in a boolean-valued series. This can in turn be used with the .loc indexing method to copy the values from the appropriate rows from column 1 to column 2
# Identify the matching rows
matches = df['column_1'].isin(my_list)
# Set the column_2 entries to column_1 in the matching rows
df.loc[matches,'column_2'] = df.loc[matches,'column_1']
If column_2 doesn't already exist, this approach creates column_2 and sets the non_matching values to NaN. The .loc method is used to avoid operating on a copy of the data when performing the indexing operations.
Upvotes: 5
Reputation: 12610
According to conventional wisdom you shouldn't be using a list comprehension for side effects. You'd be creating a (maybe huge) list that you don't need, wasting resources and hurting readability.
https://codereview.stackexchange.com/questions/58050/is-it-pythonic-to-create-side-effect-inside-list-comprehension Is it Pythonic to use list comprehensions for just side effects? Python loops vs comprehension lists vs map for side effects (i.e. not using return values)
Upvotes: 1