Normalizing (Un-Crosstabbing) Crosstabbed CSV Data TablesΒΆ
un-xtab.py is a Python module and command-line program that rearranges data in a CSV file from a crosstabbed format to a normalized format. It takes data that are in this form:
| Station | 2006-05-23 | 2006-06-15 | 2006-07-19 |
|---|---|---|---|
| WQ-01 | 4.5 | 3.7 | 6.8 |
| WQ-02 | 9.7 | 5.1 | 7.2 |
| WQ-03 | 10 | 6.1 | 8.8 |
And rearranges it into this form:
| Station | Date | Value |
|---|---|---|
| WQ-01 | 2006-05-23 | 4.5 |
| WQ-02 | 2006-05-23 | 3.7 |
| WQ-03 | 2006-05-23 | 6.8 |
| WQ-01 | 2006-06-15 | 9.7 |
| WQ-02 | 2006-05-15 | 5.1 |
| WQ-03 | 2006-06-15 | 7.2 |
| WQ-01 | 2006-07-19 | 10 |
| WQ-02 | 2006-07-19 | 6.1 |
| WQ-03 | 2006-07-19 | 8.8 |
You can use the un-xtab program to rearrange data that have been provided in a format designed for readability into a format that is more suitable for storage in a database, or for use with statistical, modeling, graphics, or other software.
The un-xtab program can deal with crosstabbed formats that include multiple rows of column headers and groups of data values that were crosstabbed together. For example, un-xtab can convert something like this:
| Laboratory Name | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Project Name | |||||||||
| Arsenic | Cadmium | Copper | |||||||
| mg/kg | mg/kg | mg/kg | |||||||
| dry | dry | dry | |||||||
| Client sample ID | Lab sample ID | Preparation date | Analysis date | Concentration | Qualifier | Concentration | Qualifier | Concentration | Qualifier |
| SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | 0.11 | U | 112 | 251 | ||
| SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | 8.21 | 582 | 396 | |||
| SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | 0.11 | U | 45.1 | 2.4 | U | |
| SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | 0.11 | U | 69 | 2.4 | U | |
| SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | 4.53 | 98.9 | 85.8 |
to this:
| Client sample ID | Lab sample ID | Preparation date | Analysis date | Analyte | Units | Measurement basis | Concentration | Qualifier |
|---|---|---|---|---|---|---|---|---|
| SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
| SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 112 | |
| SD00231 | 45298-01 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 251 | |
| SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 8.21 | |
| SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 582 | |
| SD00232 | 45298-02 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 396 | |
| SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
| SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 45.1 | |
| SD00233 | 45298-03 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 2.4 | U |
| SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 0.11 | U |
| SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 69 | |
| SD00235 | 45298-04 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 2.4 | U |
| SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Arsenic | mg/kg | dry | 4.53 | |
| SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Cadmium | mg/kg | dry | 98.9 | |
| SD00236 | 45298-05 | 2013-06-23 | 2013-06-24 | Copper | mg/kg | dry | 85.8 |