jeangelj
jeangelj

Reputation: 4498

Python Pandas Match Vlookup columns based on header values

I have the following dataframe df:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing
ABC            5      6     10     2015
BCD            6      7     3      2016        
DEF            10     4     5      2017
GHI            8      7     10     2016

I would like to look up the value of the customer in the year they joined the mailing list and save it in a new column.

Output would be:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year
ABC            5      6     10     2015                       5
BCD            6      7     3      2016                       7       
DEF            10     4     5      2017                       5
GHI            8      9     10     2016                       9

I have found some solutions for match vlookup in python, but none that would use the headers of other columns.

Upvotes: 9

Views: 5296

Answers (3)

piRSquared
piRSquared

Reputation: 294298

Deprecation Notice: lookup was deprecated in v1.2.0

Use pd.DataFrame.lookup
Keep in mind that I'm assuming Customer_ID is the index.

df.lookup(df.index, df.Year_joined_mailing)

array([5, 7, 5, 7])

df.assign(
    Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing)
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

However, you have to be careful with comparing possible strings in the column names and integers in the first year column...

Nuclear option to ensure type comparisons are respected.

df.assign(
    Purchases_1st_year=df.rename(columns=str).lookup(
        df.index, df.Year_joined_mailing.astype(str)
    )
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

Upvotes: 12

Jeremy Barnes
Jeremy Barnes

Reputation: 662

I would do it like this, assuming that the column headers and the Year_joined_mailing are the same data type and that all Year_joined_mailing values are valid columns. If the datatypes are not the same, you could convert it by adding str() or int() where appropriate.

df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

What we're doing here is iterating over the indexes in the dataframe to get the 'Year_joined_mailing' field for that index, then using that to get the column we want, and again selecting that index from the column, pushing it all to a list and assigning this to our new column 'Year_joined_mailing'

If your 'Year_joined_mailing' column will not always be a valid column name, then try:

from numpy import nan
new_col = []
for i in df.index:
    try:
        new_col.append(df[df['Year_joined_mailing'][i]][i])
    except IndexError:
        new_col.append(nan) #or whatever null value you want here)
df['Purchases_1st_year'] = new_col

This longer code snippet accomplishes the same thing, but will not break if 'Year_joined_mailing' is not in df.columns

Upvotes: 2

galaxyan
galaxyan

Reputation: 6121

you can apply "apply" to each row

df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)

Upvotes: 3

Related Questions