Atom Store
Atom Store

Reputation: 1006

How to group by the words and create an equivalent column consisting of float values? (Pandas)

I have a dataframe:

   Text                 
   Background  
   Clinical      
   Method
   Direct
   Background
   Direct

Now I want to group them in new column according to their first words like Background belong to group 1 Clinical belongs to group 2 and like this.

The expected output:

a dataframe:

   Text            Group      
   Background       1
   Clinical         2
   Method           3
   Direct           4
   Background       1
   Direct           4

Upvotes: 0

Views: 47

Answers (3)

nikeros
nikeros

Reputation: 3379

A solution could be the following:

import pandas as pd
data = pd.DataFrame([["A B", 1], ["A C", 2], ["B A", 3], ["B C", 5]], columns=("name", "value"))
data.groupby(by=[x.split(" ")[0] for x in data.loc[:,"name"]])

You can select the first few words using x.split(" ")[:NUMBER_OF_WORDS]. You then apply the aggregation you want to the need object

Upvotes: 0

Stephan Kulla
Stephan Kulla

Reputation: 5067

Idea: Make a list of unique values of the column Text and for the column Group you can assign the index of the value in this unique list. Code example:

df = pd.DataFrame({"Text": ["Background", "Clinical", "Clinical", "Method", "Background"]})

# List of unique values of column `Text`
groups = list(df["Text"].unique())

# Assign each value in `Text` its index
# (you can write `groups.index(text) + 1` when the first value shall be 1)
df["Group"] = df["Text"].map(lambda text: groups.index(text))

# Ouptut for df
print(df)

### Result:
         Text  Group
0  Background      0
1    Clinical      1
2    Clinical      1
3      Method      2
4  Background      0

Upvotes: 0

Tomer S
Tomer S

Reputation: 1030

Try this:

import pandas as pd

text = ['Background', 'Clinical', 'Method', 'Direct', 'Background', 'Direct']
df = pd.DataFrame(text, columns=['Text'])


def create_idx_map():
    idx = 1
    values = {}
    for item in list(df['Text']):
        if item not in values:
            values[item] = idx
            idx += 1
    return values

values = create_idx_map()
df['Group'] = [values[x] for x in list(df['Text'])]

print(df)

Upvotes: 1

Related Questions