Reputation: 225
This dataframe is given to me.
My desired output using a dictionary is like this
**Given the following dictionary:-**
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
How to do this using python .. I have tried like this but have failed :(
dataframe['new'] = data['documents'].apply(lambda x: dictionary[x])
Kindly help me out. Thanks in advance.
Upvotes: 1
Views: 249
Reputation: 15504
Instead of searching for d[x]
where x
is the whole sentence, you should search for d[w]
for every word w
in the sentence x
.
You can split a string into a list of words using .split()
. Then you can use a list comprehension, or map
, to search the dictionary for every word in the list:
import pandas as pd
df = pd.DataFrame({'id': range(3), 'documents': ['I am good boy', 'We are going', 'I am going']})
print(df)
# id documents
# 0 0 I am good boy
# 1 1 We are going
# 2 2 I am going
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['new'] = df['documents'].apply(lambda s: list(map(d.get, s.split())))
# or alternatively:
# df['new'] = df['documents'].apply(lambda s: [d.get(w) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
Important note: I suggest using d.get(w)
rather than d[w]
. If w
is not in the dictionary, then attempting d[w]
will raise an exception. However, d.get
accepts a default value, and will never raise an exception. By default, d.get(w)
will return None
if w
is not in d
, but you can specify the default value yourself:
df = pd.DataFrame({'id': range(4), 'documents': ['I am good boy', 'We are going', 'I am going', 'I am good words not going in dictionary']})
df['new'] = df['documents'].apply(lambda s: [d.get(w, 37) for w in s.split()])
print(df)
# id documents new
# 0 0 I am good boy [30, 45, 90, 50]
# 1 1 We are going [100, 70, 110]
# 2 2 I am going [30, 45, 110]
# 3 3 I am good words not going in dictionary [30, 45, 90, 37, 37, 110, 37, 37]
Upvotes: 1
Reputation: 120391
You can use explode
to get words then map with your dict and reshape your dataframe:
MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
.groupby(level=0).agg(list).str.join(' '))
print(df)
# Output
id documents
0 0 30 45 90 50
1 1 100 70 110
2 2 30 45 110
Step by step
Phase 1: Explode
# Split phrase into words
>>> out = df['documents'].str.split()
0 [I, am, good, boy]
1 [We, are, going]
2 [I, am, going]
Name: documents, dtype: object
# Explode lists into scalar values
>>> out = out.explode()
0 I
0 am
0 good
0 boy
1 We
1 are
1 going
2 I
2 am
2 going
Name: documents, dtype: object
Phase 2: Transform
# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0 30
0 45
0 90
0 50
1 100
1 70
1 110
2 30
2 45
2 110
Name: documents, dtype: object # <- .astype(str)
Phase 3: Reshape
# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0 [30, 45, 90, 50]
1 [100, 70, 110]
2 [30, 45, 110]
Name: documents, dtype: object
# Join your list of words
>>> out = out.str.join(' ')
0 30 45 90 50
1 100 70 110
2 30 45 110
Name: documents, dtype: object
Upvotes: 3