trey hannam
trey hannam

Reputation: 263

Create a dataframe with columns and their unique values in pandas

I have tried looking for a way to create a dataframe of columns and their unique values. I know this has less use cases but would be a great way to get an initial idea of unique values. It would look something like this....

State County City
Colorado Denver Denver
Colorado El Paso Colorado Springs
Colorado Larimar Fort Collins
Colorado Larimar Loveland

Turns into this...

State County City
Colorado Denver Denver
El Paso Colorado Springs
Larimar Fort Collins
Loveland

Upvotes: 0

Views: 1347

Answers (3)

Umar.H
Umar.H

Reputation: 23099

I would use mask and a lambda

df.mask(cond=df.apply(lambda x : x.duplicated(keep='first')), other='')

      State   County              City
0  Colorado   Denver            Denver
1            El Paso  Colorado Springs
2            Larimar      Fort Collins
3                             Loveland

Upvotes: 3

Python16367225
Python16367225

Reputation: 131

Reproducible example. Please add this next time to your future questions to help others answer your question.

import pandas as pd

df = pd.DataFrame({
    'State': ['Colorado', 'Colorado', 'Colorado', 'Colorado'], 
    'County': ['Denver', 'El Paso', 'Larimar', 'Larimar'],
    'City': ['Denver', 'Colorado Springs', 'Fort Collins', 'Loveland']
})

df

    State     County   City
0   Colorado  Denver   Denver
1   Colorado  El Paso  Colorado Springs
2   Colorado  Larimar  Fort Collins
3   Colorado  Larimar  Loveland

Drop duplicates from each column separately and then concatenate. Fill NaN with empty string.

pd.concat([df[col].drop_duplicates() for col in df], axis=1).fillna('')

    State     County        City
0   Colorado  Denver        Denver
1   El Paso   Colorado      Springs
2   Larimar   Fort Collins
3             Loveland

Upvotes: 1

trey hannam
trey hannam

Reputation: 263

This is the best solution I have come up with, hope to help others looking for something like it!

def create_unique_df(df) -> pd.DataFrame:
    """ take a dataframe and creates a new one containing unique values for each column
    note, it only works for two columns or more

    :param df: dataframe you want see unique values for
    :param type: pandas.DataFrame
    return: dataframe of columns with unique values
    """
    # using list() allows us to combine lists down the line
    data_series = df.apply(lambda x: list( x.unique() ) )

    list_df = data_series.to_frame()

    # to create a df from lists they all neet to be the same leng. so we can append null 
    # values
    # to lists and make them the same length. First find differenc in length of longest list and
    # the rest
    list_df['needed_nulls'] = list_df[0].str.len().max() - list_df[0].str.len()

    # Second create a column of lists with one None value
    list_df['null_list_placeholder'] = [[None] for _ in range(list_df.shape[0])]

    # Third multiply the null list times the difference to get a list we can add to the list of
    # unique values making all the lists the same length. Example: [None] * 3  == [None, None, 
    # None]
    list_df['null_list_needed'] = list_df.null_list_placeholder * list_df.needed_nulls
    list_df['full_list'] = list_df[0] + list_df.null_list_needed

    unique_df = pd.DataFrame(
        list_df['full_list'].to_dict()
    )

    return unique_df

Upvotes: 0

Related Questions