Reputation: 160
Consider I have multiple lists
A = [1, 2, 3]
B = [1, 4]
and I want to generate a Pandas DataFrame in long format as follows:
type | value
------------
A | 1
A | 2
A | 3
B | 1
B | 4
What is the easiest way to achieve this? The way over the wide format and melt is not possible(?) because the lists may have different lengths.
Upvotes: 3
Views: 448
Reputation: 164673
Here's a NumPy-based solution using a dictionary input:
d = {'A': [1, 2, 3],
'B': [1, 4]}
keys, values = zip(*d.items())
res = pd.DataFrame({'type': np.repeat(keys, list(map(len, values))),
'value': np.concatenate(values)})
print(res)
type value
0 A 1
1 A 2
2 A 3
3 B 1
4 B 4
Upvotes: 1
Reputation: 2255
Check this, this borrows the idea from dplyr, tidyr, R programming languages' 3rd libs, the following code is just for demo, so I created two df: df1, df2, you can dynamically create dfs and concat them:
import pandas as pd
def gather(df, key, value, cols):
id_vars = [col for col in df.columns if col not in cols]
id_values = cols
var_name = key
value_name = value
return pd.melt(df, id_vars, id_values, var_name, value_name)
df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'B': [1, 4]})
df_messy = pd.concat([df1, df2], axis=1)
print(df_messy)
df_tidy = gather(df_messy, 'type', 'value', df_messy.columns).dropna()
print(df_tidy)
And you got output for df_messy
A B
0 1 1.0
1 2 4.0
2 3 NaN
output for df_tidy
type value
0 A 1.0
1 A 2.0
2 A 3.0
3 B 1.0
4 B 4.0
PS: Remeber to convert the type of values from float to int type, I just wrote it down for a demo, and didn't pay too much attention about the details.
Upvotes: 0
Reputation: 862681
Create dictionary for type
s and create list of tuples by list comprehension:
A = [1, 2, 3]
B = [1, 4]
d = {'A':A,'B':B}
print ([(k, y) for k, v in d.items() for y in v])
[('A', 1), ('A', 2), ('A', 3), ('B', 1), ('B', 4)]
df = pd.DataFrame([(k, y) for k, v in d.items() for y in v], columns=['type','value'])
print (df)
type value
0 A 1
1 A 2
2 A 3
3 B 1
4 B 4
Another solution, if input is list of lists and type
s should be integers:
L = [A,B]
df = pd.DataFrame([(k, y) for k, v in enumerate(L) for y in v], columns=['type','value'])
print (df)
type value
0 0 1
1 0 2
2 0 3
3 1 1
4 1 4
Upvotes: 1