Normalizing (Un-Crosstabbing) Crosstabbed CSV Data TablesΒΆ
un-xtab.py is a Python module and command-line program that rearranges data in a CSV file from a crosstabbed format to a normalized format. It takes data that are in this form:
Station | 2006-05-23 | 2006-06-15 | 2006-07-19 |
---|---|---|---|
WQ-01 | 4.5 | 3.7 | 6.8 |
WQ-02 | 9.7 | 5.1 | 7.2 |
WQ-03 | 10 | 6.1 | 8.8 |
And rearranges it into this form:
Station | Date | Value |
---|---|---|
WQ-01 | 2006-05-23 | 4.5 |
WQ-02 | 2006-05-23 | 3.7 |
WQ-03 | 2006-05-23 | 6.8 |
WQ-01 | 2006-06-15 | 9.7 |
WQ-02 | 2006-05-15 | 5.1 |
WQ-03 | 2006-06-15 | 7.2 |
WQ-01 | 2006-07-19 | 10 |
WQ-02 | 2006-07-19 | 6.1 |
WQ-03 | 2006-07-19 | 8.8 |
You can use the un-xtab program to rearrange data that have been provided in a format designed for readability into a format that is more suitable for storage in a database, or for use with statistical, modeling, graphics, or other software.
The un-xtab program can deal with crosstabbed formats that include multiple rows of column headers and groups of data values that were crosstabbed together. For example, un-xtab can convert something like this:
Laboratory Name | |||||||||
---|---|---|---|---|---|---|---|---|---|
Project Name | |||||||||
Arsenic | Cadmium | Copper | |||||||
mg/kg | mg/kg | mg/kg | |||||||
dry | dry | dry | |||||||
Client sample ID | Lab sample ID | Preparation date | Analysis date | Concentration | Qualifier | Concentration | Qualifier | Concentration | Qualifier |
SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | 0.11 | U | 112 | 251 | ||
SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | 8.21 | 582 | 396 | |||
SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | 0.11 | U | 45.1 | 2.4 | U | |
SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | 0.11 | U | 69 | 2.4 | U | |
SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | 4.53 | 98.9 | 85.8 |
to this:
Client sample ID | Lab sample ID | Preparation date | Analysis date | Analyte | Units | Measurement basis | Concentration | Qualifier |
---|---|---|---|---|---|---|---|---|
SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 112 | |
SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 251 | |
SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 8.21 | |
SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 582 | |
SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 396 | |
SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 45.1 | |
SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 2.4 | U |
SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 69 | |
SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 2.4 | U |
SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 4.53 | |
SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 98.9 | |
SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 85.8 |