LearnerBegineer
LearnerBegineer

Reputation: 23

how to extract data from column which looks like a dictionary in Pandas?

Hi I am new to pandas/python and trying to read a txt file in pandas I want to extract key, value pairs for each row. Make the key as new column name and its respective value as values.

Input

data   
{'Name': 'Tim', 'Class': 'Ninth', 'Hobbies' : 'Football'} 
{'Name': 'Tom', 'Class': 'Ninth', 'Hobbies' : 'Football'}
{'Name': 'Jim', 'Class': 'Ninth', 'Hobbies' : 'Football'}
{'Name': 'John', 'Class': 'Ninth'}

Expected Output:

Name    Class   Hobbies
Tim Ninth   Football
Tom Ninth   Football
Jim Ninth   Football
John    Ninth   NA
import pandas as pd

df1 = pd.read_csv('9data.txt',sep = '\t')
df1['Name'] = df1['data'].apply(lambda x : x.values()[1])
print(df1)

Error: AttributeError: 'str' object has no attribute 'values'

Is there any way in which i can do this in pandas ?

Upvotes: 0

Views: 51

Answers (1)

Jonathan Leon
Jonathan Leon

Reputation: 5648

The way the data was being read, I could get it a new dataframe using eval(). This will iterate over each cell creating a new dataframe then concatenating them.

data='''data
{'Name': 'Tim', 'Class': 'Ninth', 'Hobbies' : 'Football'} 
{'Name': 'Tom', 'Class': 'Ninth', 'Hobbies' : 'Football'}
{'Name': 'Jim', 'Class': 'Ninth', 'Hobbies' : 'Football'}
{'Name': 'John', 'Class': 'Ninth'}'''

df = pd.read_csv(io.StringIO(data), sep='\t', engine='python')
df1 = pd.concat([pd.json_normalize(eval(x)) for x in df['data']])

Output

   Name  Class   Hobbies
0   Tim  Ninth  Football
0   Tom  Ninth  Football
0   Jim  Ninth  Football
0  John  Ninth       NaN

If you can get your data look like this, this is simpler method that Anurag Dabas alludes to. You might consider reading the file into a list first, then creating the dataframe, rather creating a dataframe from a dataframe.

datal = [{'Name': 'Tim', 'Class': 'Ninth', 'Hobbies' : 'Football'},
{'Name': 'Tom', 'Class': 'Ninth', 'Hobbies' : 'Football'},
{'Name': 'Jim', 'Class': 'Ninth', 'Hobbies' : 'Football'},
{'Name': 'John', 'Class': 'Ninth'}]
df = pd.DataFrame(datal)
df

Upvotes: 1

Related Questions