XYZ
XYZ

Reputation: 225

How to convert each word of each row to numeric value of a dataframe

This dataframe is given to me.

enter image description here

My desired output using a dictionary is like this

**Given the following dictionary:-** 
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}

enter image description here

How to do this using python .. I have tried like this but have failed :(

dataframe['new'] = data['documents'].apply(lambda x: dictionary[x]) 

Kindly help me out. Thanks in advance.

Upvotes: 1

Views: 249

Answers (2)

Stef
Stef

Reputation: 15504

Instead of searching for d[x] where x is the whole sentence, you should search for d[w] for every word w in the sentence x.

You can split a string into a list of words using .split(). Then you can use a list comprehension, or map, to search the dictionary for every word in the list:

import pandas as pd

df = pd.DataFrame({'id': range(3), 'documents': ['I am good boy', 'We are going', 'I am going']})

print(df)
#    id      documents
# 0   0  I am good boy
# 1   1   We are going
# 2   2     I am going

d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}

df['new'] = df['documents'].apply(lambda s: list(map(d.get, s.split())))

# or alternatively:
# df['new'] = df['documents'].apply(lambda s: [d.get(w) for w in s.split()])

print(df)
#    id      documents               new
# 0   0  I am good boy  [30, 45, 90, 50]
# 1   1   We are going    [100, 70, 110]
# 2   2     I am going     [30, 45, 110]

Important note: I suggest using d.get(w) rather than d[w]. If w is not in the dictionary, then attempting d[w] will raise an exception. However, d.get accepts a default value, and will never raise an exception. By default, d.get(w) will return None if w is not in d, but you can specify the default value yourself:

df = pd.DataFrame({'id': range(4), 'documents': ['I am good boy', 'We are going', 'I am going', 'I am good words not going in dictionary']})

df['new'] = df['documents'].apply(lambda s: [d.get(w, 37) for w in s.split()])

print(df)
#    id                                documents                                new
# 0   0                            I am good boy                   [30, 45, 90, 50]
# 1   1                             We are going                     [100, 70, 110]
# 2   2                               I am going                      [30, 45, 110]
# 3   3  I am good words not going in dictionary  [30, 45, 90, 37, 37, 110, 37, 37]

Upvotes: 1

Corralien
Corralien

Reputation: 120391

You can use explode to get words then map with your dict and reshape your dataframe:

MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}

df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
                                  .groupby(level=0).agg(list).str.join(' '))
print(df)

# Output
   id    documents
0   0  30 45 90 50
1   1   100 70 110
2   2    30 45 110

Step by step

Phase 1: Explode

# Split phrase into words
>>> out = df['documents'].str.split()
0    [I, am, good, boy]
1      [We, are, going]
2        [I, am, going]
Name: documents, dtype: object

# Explode lists into scalar values
>>> out = out.explode()
0        I
0       am
0     good
0      boy
1       We
1      are
1    going
2        I
2       am
2    going
Name: documents, dtype: object

Phase 2: Transform

# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0     30
0     45
0     90
0     50
1    100
1     70
1    110
2     30
2     45
2    110
Name: documents, dtype: object  # <- .astype(str)

Phase 3: Reshape

# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0    [30, 45, 90, 50]
1      [100, 70, 110]
2       [30, 45, 110]
Name: documents, dtype: object

# Join your list of words
>>> out = out.str.join(' ')
0    30 45 90 50
1     100 70 110
2      30 45 110
Name: documents, dtype: object

Upvotes: 3

Related Questions