Reputation: 605
Current Format:
YEAR A B C
2010 0.98 0.17 0.09
2011 0.11 1.00 0.37
2012 0.77 0.29 0.96
2013 0.51 0.93 0.40
2014 0.03 0.61 0.26
2015 0.42 0.55 0.92
Seaborn, from my understanding expects:
A 2010 0.98
A 2011 0.11
A 2012 0.77
A 2013 0.51
A 2014 0.03
A 2015 0.42
B 2010 0.17
B 2011 1.00
B 2012 0.29
B 2013 0.93
B 2014 0.61
B 2015 0.55
I understand we can convert easily, but someone asked me this question. Given even excel accept both formats, why does visualization libraries are restricted to rather unfriendly long format?
Upvotes: 1
Views: 387
Reputation: 3341
For seaborn, I think the philosophy is can be seen in this statement in introduction :
Notice how we only provided the names of the variables in the dataset and the roles that we wanted them to play in the plot. Unlike when using matplotlib directly, it wasn’t necessary to translate the variables into parameters of the visualization (e.g., the specific color or marker to use for each category). That translation was done automatically by seaborn. This lets the user stay focused on the question they want the plot to answer.
I understand it this way : The idea is to focus on features names and their relations. Most of the time, users rely on Pandas for building the data matrix, which incites you to give column names, i.e. features names. It is maybe easier for most people to perform treatment on the A column with numpy or pandas. That is not a fact, just my today’a feeling about it.
Or maybe most of us grew up with Excel...
Edit : now I realize I’m just repeating ImportanceOfBeingErnest’s comment ...
Upvotes: 3