Reputation: 3722
I'm looking to grab values from one dataframe and append them to another based on index - and to do it fast. Lets say we have the following 2 dataframes:
df
has a bunch of random numbers with cities as the column headers
df2
is a dataframe full of cities with (unimportant) letters as the headers
In df2
at position df.loc[0,'c']
we have Los Angeles
. I want to then go to df
and find the Los Angeles
column and take the value from the 0
index, and append it to df2
so that it reads Los Angeles : 544
import pandas as pd
import numpy as np
np.random.seed(42)
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
cities = ['Minneapolis', 'Boston', 'Chicago', 'San Francisco', 'Los Angeles', 'Bozeman', 'New York', 'Austin',
'Dallas', 'New Orleans']
df = pd.DataFrame({city:np.random.randint(1, 1000, 10000) for city in cities})
df = df.astype(str)
df2 = pd.DataFrame({k:[np.random.choice(cities) for i in range(10000)] for k in letters})
df
Minneapolis Boston Chicago San Francisco Los Angeles
0 103 720 444 335 544
1 436 833 813 252 418
2 861 140 428 919 339
3 271 792 83 804 801
4 107 546 765 825 320
df2
a b c d e
0 San Francisco Minneapolis Los Angeles Chicago Los Angeles
1 Minneapolis San Francisco Los Angeles Chicago San Francisco
2 Minneapolis Los Angeles Boston Chicago Boston
3 Chicago Chicago Chicago Boston Los Angeles
4 San Francisco San Francisco San Francisco Boston Los Angeles
Expected Result:
df3
a b c \
0 San Francisco : 335 Minneapolis : 103 Los Angeles : 544
1 Minneapolis : 436 San Francisco : 252 Los Angeles : 418
2 Minneapolis : 861 Los Angeles : 339 Boston : 140
3 Chicago : 83 Chicago : 83 Chicago : 83
4 San Francisco : 825 San Francisco : 825 San Francisco : 825
I solved this problem by doing a double for loop but it is taking too long since my dataframe is (280k, 260) and I have to do this function twice. I'm looking for a way in numpy to do this. Ideally if I can create a new numpy array where it has numbers as strings in the correct index locations, and I can use np.char.add(df2, df)
to just add matrix-wise the words and numbers together. Just trying to figure out a quick way to index these values.
import time
s = time.time()
for col in df2.columns:
for i in range(0, len(df2)):
try:
df2.loc[i, col] += ' : ' + str(np.round(df.loc[i, df2.loc[i, col]], 2))
except:
pass
print(time.time()-s)
16.95679211616516
Upvotes: 0
Views: 57
Reputation: 150785
You can use lookup
:
s = df2.stack().reset_index(name='val');
df2 + ' : ' + df.lookup(s['level_0'], s['val']).reshape(df2.shape)
whose execution time on the 10000-row data is:
44 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Output for the truncated data:
a b c \
0 San Francisco : 335 Minneapolis : 103 Los Angeles : 544
1 Minneapolis : 436 San Francisco : 252 Los Angeles : 418
2 Minneapolis : 861 Los Angeles : 339 Boston : 140
3 Chicago : 83 Chicago : 83 Chicago : 83
4 San Francisco : 825 San Francisco : 825 San Francisco : 825
d e
0 Chicago : 444 Los Angeles : 544
1 Chicago : 813 San Francisco : 252
2 Chicago : 428 Boston : 140
3 Boston : 792 Los Angeles : 801
4 Boston : 546 Los Angeles : 320
Upvotes: 2