Reputation: 45
I have one csv file imported as pandas dataframe with filenames in one column. I have another file which is a numpy array with the same filenames in it but at different indexes. Can you help me with iterating over the filenames in the csv file to find the match in the numpy file and extracting the index where the filename is at in the numpy file.
So for example:
d = {'col1': ["Apple", "Peach"], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 Apple 3
1 Peach 4
b = np.array(["Apple", "Banana", "Pear", "Peach"])
b
array(['Apple', 'Banana', 'Pear', 'Peach'], dtype='<U6')
Now i would like to now from every item in the df at what indexes they are in the array so i can append something at that position in another array.
I have tried something like this:
for i,j in df:
if j in b:
print(b.get_loc)
Upvotes: 0
Views: 600
Reputation: 23099
IIUC, we can turn array and df into a dict by indices as their keys and use a function to finding matching pairs :
import collections as colls
import numpy as np
import pandas as pd
d = {'col_1': ['Apple', 'Peach'], 'col_2': [3, 4]}
df = pd.DataFrame(data=d)
b = np.array(['Apple', 'Banana', 'Pear', 'Peach'])
d_1 = df['col_1'].to_dict()
d_2 = dict(enumerate(b))
def dicts_to_tuples(*dicts):
result = colls.defaultdict(list)
for curr_dict in dicts:
for k, v in curr_dict.items():
result[v].append(k)
return [tuple(v) for v in result.values() if len(v) > 1]
print(d_1) # {0: 'Apple', 1: 'Peach'}
print(d_2) # {0: 'Apple', 1: 'Banana', 2: 'Pear', 3: 'Peach'}
print(dicts_to_tuples(d_1, d_2)) # [(0, 0), (1, 3)]
the rest is down to you.
you could even turn the array into a datframe and perform a merge :
df2 = pd.DataFrame(b)
merge_ = pd.merge(df,df2,left_on=['col1',df.index],right_on=['col1',df2.index],how='inner')
Upvotes: 1
Reputation: 7224
Does this solution work? Do you need to know the corresponding key, if not, this is just the index list:
mask = np.in1d(b,df['col1'])
idx = np.arange(len(mask))
idx[mask]
# array([0, 3])
You can also do this to get a dict of the locations:
df['idx'] = idx[mask]
df.set_index('idx')['col1'].to_dict()
# {0: 'Apple', 3: 'Peach'}
df.set_index('col1')['idx'].to_dict()
# {'Apple': 0, 'Peach': 3}
Upvotes: 1