Reputation: 309
CONTEXT
I am trying to create a DataFrame and fill out columns in that DataFrame based on whether or not the inserted lists have those columns.
Example Data:
Name Height Hair Color Eye Color
Bob 72 Blonde Blue
George 64 Green
John Brown Brown
The columns in the DataFrame would contain all the variables I want recorded but if a person does not have information for each column I'd like to fill out what I can in the DataFrame.
Sample Data / Code
name = ['Name', 'Bob'] <----- Each element has the associated column name and the value in a list.
height = ['Height', '72'] <----- Possible to search for height[0] in columns and place height[1] in there?
eye_color = ['Eye Color', 'Brown']
person = [name, height, eye_color]
columns = ['Name', 'Height', 'Hair Color', 'Eye Color']
df = pd.DataFrame(person, columns = columns)
Expected Outcome
Name Height Hair Eye Color
Bob 72 Brown
PROBLEM
I want to be able to pass a person through and fill out a column based on the information that is there and leave any columns that aren't there blank. And append people to the DataFrame in the same fashion. Is this possible?
Please let me know if any additional details would help in answering this question!
Upvotes: 0
Views: 336
Reputation: 23773
You can make an empty DataFrame and just specify the columns.
In [21]: df = pd.DataFrame(columns=['name','a','b','c'])
In [22]: df
Out[22]:
Empty DataFrame
Columns: [name, a, b, c]
Index: []
Then you can append
In [23]: df = df.append({'name':'bob','c':0},ignore_index=True)
In [24]: df
Out[24]:
name a b c
0 bob NaN NaN 0
In [25]: df = df.append({'name':'geo','b':'foo'},ignore_index=True)
In [26]: df
Out[26]:
name a b c
0 bob NaN NaN 0
1 geo NaN foo NaN
Multiple rows:
In [32]: more = [{'name':'qq','b':'apples'},
{'name':'wildbill','a':'nickels'},
{'name':'lastone','b':'potatoes','c':16}]
In [33]: df = df.append(more,ignore_index=True)
In [33]:
In [34]: df
Out[34]:
name a b c
0 bob NaN NaN 0
1 geo NaN foo NaN
2 qq NaN apples NaN
3 wildbill nickels NaN NaN
4 lastone NaN potatoes 16
Or if you can ensure all the columns are covered:
In [36]: more
Out[36]:
[{'b': 'apples', 'name': 'qq'},
{'a': 'nickels', 'name': 'wildbill'},
{'b': 'potatoes', 'c': 16, 'name': 'lastone'}]
In [37]: pd.DataFrame(more)
Out[37]:
a b c name
0 NaN apples NaN qq
1 nickels NaN NaN wildbill
2 NaN potatoes 16.0 lastone
Looks like DataFrame will consume a generator.
In [3]: more
Out[3]:
[{'b': 'apples', 'name': 'qq'},
{'a': 'nickels', 'name': 'wildbill'},
{'b': 'potatoes', 'c': 16, 'name': 'lastone'}]
In [4]: def f():
...: for d in more:
...: yield d
...:
In [5]: pd.DataFrame(f())
Out[5]:
a b c name
0 NaN apples NaN qq
1 nickels NaN NaN wildbill
2 NaN potatoes 16.0 lastone
There is probably a better way.
Upvotes: 1
Reputation: 16683
Here is a dynamic list comprehension method using the lists you have created in this example:
name = ['Name', 'Bob']
height = ['Height', '72']
eye_color = ['Eye Color', 'Brown']
person = [name, height, eye_color]
columns = ['Name', 'Height', 'Hair Color', 'Eye Color']
df = pd.DataFrame([{i:j} for (i,j) in zip([name[0], height[0], eye_color[0]],
[name[1], height[1], eye_color[1]])
for col in df.columns if i == col], columns=columns)
df = df.apply(lambda x: pd.Series(x.dropna().values))
df
Name Height Hair Color Eye Color
0 Bob 72 NaN Brown
Upvotes: 0
Reputation: 2786
Are you open to rethinking what a person object is? If so you should consider dict for each person like below. It makes your life much easier.
import pandas as pd
columns = ['Name', 'Height', 'Hair Color', 'Eye Color']
df = pd.DataFrame(columns = columns)
person = {'Name':['Bob'], 'Height':['72'], 'Eye Color': ['Brown']}
person2 = {'Name':['Sue'], 'Height':['48'], 'Eye Color': ['Blue'], 'Hair Color': ['Blonde']}
person3 = {'Name':['Hank'], 'Height':['74'], 'Hair Color': ['Black']}
#add persons... could loop through
df = df.append(pd.DataFrame(person))
df = df.append(pd.DataFrame(person2))
df = df.append(pd.DataFrame(person3))
print(df)
Name Height Hair Color Eye Color
0 Bob 72 NaN Brown
0 Sue 48 Blonde Blue
0 Hank 74 Black NaN
If you don't want to change person you can also just make a simple function to convert it:
def person_to_dict(person):
person_dict = {}
for attr in person:
person_dict[attr[0]]=[attr[1]]
return person_dict
person = person_to_dict(person)
Upvotes: 1