23 de mayo de 2016

58. Análisis de documentos XML en Python

Acceso al XML de ejemplo: catalogo.xml
  1. Tutorial DOM y XML (inglés).
  2. Para saber más.
  3. Datos abiertos Gobierno de España

Analizar contenido XML con DOM

Al escribir un programa en Python, usaremos el estándar DOM del W3C. Cargaremos el documento como un objeto, usando el módulo xml.dom y recuperando el objeto "minidom", que provee un acceso rápido al documento.
NOTA: tanto el fichero en Python como el fichero de ejemplo cd_catalogo.xml (descárgalo) deben estar en la misma carpeta.
#!/usr/bin/python
# coding: utf-8

from xml.dom.minidom import parse
import xml.dom.minidom

# Abre el documento XML usando el analizador (parser) minidom
DOMTree = xml.dom.minidom.parse("cd_catalogo.xml") #-> Modelo del Documento en forma de árbol
collection = DOMTree.documentElement # -> Objeto raíz
print "El nombre de la coleccion es: %s \n" % collection.localName

# Obtiene una lista de los objetos con la etiqueta CD
cds = collection.getElementsByTagName("CD")

# Muestra en pantalla cada detalle de cada CD
for cd in cds:    
   print "*****CD*****" 
   titulo = cd.getElementsByTagName('TITULO')[0]
   print "Título: %s" % titulo.childNodes[0].data.encode("utf-8")
   artista = cd.getElementsByTagName('ARTISTA')[0]
   print "artista: %s" % artista.childNodes[0].data.encode("utf-8")
   pais = cd.getElementsByTagName('PAIS')[0]
   print "País: %s" % pais.childNodes[0].data.encode("utf-8")
   comp = cd.getElementsByTagName('PAIS')[0]
   print "Compañía: %s" % comp.childNodes[0].data.encode("utf-8")
   precio = cd.getElementsByTagName('PRECIO')[0]
   print "Precio: %s €" % precio.childNodes[0].data.encode("utf-8")
   anno = cd.getElementsByTagName('ANNO')[0]
   print "Año: %s" % anno.childNodes[0].data.encode("utf-8")
   print "=" * 20 + "\n"
Todos los métodos del xml.dom pueden encontrarse en: https://docs.python.org/2.7/library/xml.dom.html
= = =

Abrir fichero XML y presentarlo en pantalla

Apertura del fichero XML y presentación en pantalla
#!/usr/bin/python
# coding: utf-8

from xml.dom.minidom import parse
import xml.dom.minidom

# Abre el documento XML usando el analizador (parser) minidom
modelo = xml.dom.minidom.parse("cd_catalogo.xml") #-> Modelo # #  del Documento en forma de árbol. Apertura por nombre
# O bien
# fichero = open("cd_catalogo.xml")
# modelo = parse(fichero) # Otra forma de abrir el fichero.

coleccion = modelo.documentElement # -> Objeto raíz
print "El nombre de la coleccion es: %s \n" % coleccion.localName

print coleccion.toxml()
# print coleccion.toprettyxml() # --> formas de presentar los daros como un bloque
 
= = = 
 

Crear un fichero XML y guardarlo

Crear un fichero XML
# coding: utf-8

from xml.dom import minidom

Ordenador1 = ['Pentium M', '512MB']
Ordenador2 = ['Pentium Core 2', '1024MB']
Ordenador3 = ['Pentium Core Duo', '1024MB']
listaOrdenadores = [Ordenador1, Ordenador2, Ordenador3]

# Abro un modelo DOM en modo implementar
DOMimpl = minidom.getDOMImplementation()

#Crear el documento econ la etiqueta principal estacionesTrabajo
xmldoc = DOMimpl.createDocument(None,"estacionesTrabajo", None)
doc_root = xmldoc.documentElement

# Recorro la lista de ordenadores
for ordenador in listaOrdenadores:
    
    #Crear Nodo... (*)
    nodo = xmldoc.createElement("Ordenador")

    # Crear un subnodo, llamado procesador
    elemento = xmldoc.createElement('Procesador')
    # Le añado un nodo de texto, y le asigno la posición 0 de la lista
    elemento.appendChild(xmldoc.createTextNode(ordenador[0]))
    # Añado el subnodo al nodo anterior
    nodo.appendChild(elemento)
    
    # Idéntico.
    elemento = xmldoc.createElement('Memoria')
    elemento.appendChild(xmldoc.createTextNode(ordenador[1]))
    nodo.appendChild(elemento)

    # (*)... que se añade como hijo al doc_root
    doc_root.appendChild(nodo)

# Recorrer para presentar en pantalla la lista de los nodos
listaNodos = doc_root.childNodes
for nodo in listaNodos:
    print nodo.toprettyxml()

# Guardar la información en un fichero de texto
fichero = open("ordenadores.xml", 'w')
# fichero.write(xmldoc.toxml())
# fichero.write(xmldoc.toprettyxml()) --> diferentes formas de guardar un fichero xml
fichero.write(xmldoc.toprettyxml(encoding="utf-8"))
fichero.close()
 
= = = 
 

Análisis de los elementos de la primera etiqueta en cd_catalogo.xml

# coding: utf-8

from xml.dom import minidom                                      

xmldoc = minidom.parse('cd_catalogo.xml')   
print xmldoc                                                               
grammarNode = xmldoc.firstChild
print grammarNode # CATALOGO

refNode = grammarNode.childNodes[1]     
print refNode #--> Nivel 1, etiqueta CD.
print refNode.childNodes  # --> Todo lo que hay bajo la etiqueta CD                         

pNode = refNode.childNodes[3]
print pNode #--> El elemento que está tercero. "ARTISTA"
print pNode.toxml()  # -> Impresión de dicho nodo.

print pNode.firstChild      #-> Texto que hay dentro de "ARTISTA", pero como objeto                         
print pNode.firstChild.data #-> Texto extraído del objeto anterior.
 
= = = 
 

Extraer texto y modificarlo

Extraigo texto del fichero de ejemplo cd_catalogo.xml y lo modifico. En particular a la primera entrada "Empire Burlesque", se le hace la modificación "Imperio Burlesco". En el ejemplo, los datos se escriben en un nuevo fichero cd_catalog2.xml
#!/usr/bin/python
# coding: utf-8

from xml.dom.minidom import parse
import xml.dom.minidom

# Abre el documento XML usando el analizador (parser) minidom
modelo = xml.dom.minidom.parse("cd_catalogo.xml") #-> Modelo # #  del Documento en forma de árbol. Apertura por nombre
# O bien
# fichero = open("cd_catalogo.xml")
# modelo = parse(fichero) # Otra forma de abrir el fichero.
# después al final habrá que cerrar el fichero. fichero.close()

coleccion = modelo.documentElement # -> Objeto raíz
print "El nombre de la coleccion es: %s \n" % coleccion.localName

cds = coleccion.getElementsByTagName("CD")

for cd in cds:    
    
    titulo = cd.getElementsByTagName("TITULO")
    # titulo es una lista de Nodos, aunque sólo sea uno. Lo siguiente print titulo.toxml() no funcionará
    # print titulo[0].toxml() Esto si funcionará
    # print titulo[0].childNodes # --> Esto es un objeto
    print titulo[0].childNodes[0].data #--> Accede a un dato de texto
    # print titulo[0].childNodes[0].data.encode("utf-8") #-> no está de más codificar en utf-8
    if titulo[0].childNodes[0].data.encode("utf-8") == "Empire Burlesque":
        print "detectado"
        titulo[0].childNodes[0].data = u'Imperio Burlesco' # --> Tiene que ser en formato UNICODE u''
        print titulo[0].childNodes[0].data #--> Accede a un dato de texto
    print "=" * 20    

fichero = open("cd_catalogo2.xml","w")
# fichero = open("cd_catalogo.xml","w") # --> Para sobrescribir en el mismo fichero.
fichero.write(coleccion.toxml(encoding='utf-8')) #--> Forzar la codificación a UTF-8
fichero.close()
# print coleccion.toprettyxml() # --> formas de presentar los daros como un bloque '''
 
= = =
 

Generar fichero XML con DOM

Generar fichero XML con DOM
# coding: utf-8

from xml.dom import minidom, Node

doc = minidom.Document() #--> Crear un documento xml

doc.appendChild(doc.createComment("Creando documento de ejemplo XML")) #-> Escribir comentario

book = doc.createElement('libro') # --> Crear elemento dentro del documento
doc.appendChild(book) #-> Añadirlo a la raíz del documento

title = doc.createElement('Título') # --> Crear elemento título
title.appendChild(doc.createTextNode('D. Quijote de La Mancha')) # --> Agrgar nodo de texto
book.appendChild(title) # --> Añadir dentro del elemento book

author = doc.createElement('Autor') # --> Crear elemento Autor
book.appendChild(author) # --> Añadir al elemento libro

name = doc.createElement('Nombre y Apellidos')  # --> Crear elemento Nombre y Apellidos
author.appendChild(name) # --> Añadir dentro de Autor

firstname = doc.createElement('Nombre') # --> Crear elemento Nombre
name.appendChild(firstname)
firstname.appendChild(doc.createTextNode('Miguel'))
name.appendChild(doc.createTextNode('Texto añadido aquí'))
lastname = doc.createElement('Apellidos') # --> Crear elemento  Apellidos
name.appendChild(lastname)
lastname.appendChild(doc.createTextNode('De Cervantes Saavedra'))

chapter = doc.createElement('Capítulo')
book.appendChild(chapter)
chapter.setAttribute('number', '1')
title = doc.createElement('title')
chapter.appendChild(title)
title.appendChild(doc.createTextNode('Capítulo Primero'))

print doc.toprettyxml(indent =' ')

= =  =

Presentar documento XML en forma de árbol

Presentar documento XML en forma de árbol
# coding: utf-8

from xml.dom import minidom, Node

def scanNode(node, level = 0):
    msg = node.__class__.__name__
    texto=""
    if node.nodeType == Node.ELEMENT_NODE:
        msg += ", tag: " + node.tagName
    elif node.nodeType == Node.TEXT_NODE:
        texto = ": "+node.data
    print " " * level * 4, msg, texto
    if node.hasChildNodes:
        for child in node.childNodes:
            scanNode(child, level + 1)
            
doc = minidom.parse('cd_catalogo.xml') 
scanNode(doc)
 
= = =  
 

 
 

24 comentarios:

  1. I like your blog, I read this blog please update more content on python, further check it once at python online course

    ResponderEliminar
  2. We are a top rated do my assignment Online service here with experts specializing in a wide range of disciplines ensuring you get the assignments that score maximum grades.

    ResponderEliminar
  3. Really appreciate this stunning articles information about many topics that you have provided for everyone. Great posts and a great thoughts as well i really get amazed to read this. Its really good and, Thank you for sharing like this info with me, I am very thankful to you. Good luck for new releases. assignment help online - auditing assignments - university assignment help - help assignment

    ResponderEliminar
  4. Took me time to read all of blogs, but I actually loved specially this blog. It proved to be very helpful and beneficial to me and I am sure to all the commenters here after reading this post! Your all blogs very educational and useful to many readers! I’m sure you had pleasure writing this article. Anyway, Thank you so much write this topic and share with me. I really need it. engineering mathematics assignment help - geotechnical engineering assignment help - rationalism assignment help - recruitment assignment help

    ResponderEliminar
  5. Free vashikaran specialist Vashikaran “Sanskrit language, which is used widely in the field of astrology, is a word. It’s the Sanskrit language Free Vashikaran Specialist Vashikaran will not be an obscure term for you if you have been searching for a vashikaran expert and looking for one here too. Vashikaran Specialist in Bangalore However, vashikaran was also utilized at the time of kings and queens to control the brains of individuals related to them. Love Marriage Specialist in Mumbai
    . Astrologer Rohit

    ResponderEliminar
  6. I am here now and just want to say thank you for a tremendous post and an all-around entertaining website. It is an extraordinary site you have here…I sincerely acknowledge BLOGGERS like you! sorry quotes for him

    ResponderEliminar
  7. Consult with Astrologer Ganesh Shastri Ji who is One of Powerful Love Vashikaran Specialist Astrologer available to solve your all problems with ease & perfection. ||Love Vashikaran Specialist
    ||Vashikaran Specialist Astrologer
    ||vashikaran specialist

    ResponderEliminar
  8. Do you want to get Love Problem Solution on call? Are you finding a love problem solution online? There is no need to go anywhere to get a online love problem solution astrologer. ||Online love problem solution astrologer||Love Marriage Problem Solution Baba Ji

    ResponderEliminar
  9. What a blog, I really like to thank blog author for the wonderful post. Thanks for sharing.
    one of the top uk cosmetics products for skin care. Must try these, Highly Recommended.

    indian healing clay
    gorilla snot
    creme of nature
    shea moisture uk
    jerome russell silver toner
    chear
    brylcreem gel cream
    dudu osun

    ResponderEliminar

  10. It was a wonderful chance to visit this kind of site and I am happy to know. thank you so much for giving us a chance to have this opportunity
    야설
    Feel free to visit my blog :
    야설

    ResponderEliminar

  11. Nice to be visiting your blog again. it has been months for me. Well this article that i’ve been waited for so long. I need this article to complete my assignment in the college. and it has same topic with your article. Thanks. great share
    일본야동
    Feel free to visit my blog :일본야동

    ResponderEliminar

  12. Hello, I think that I saw you visited my weblog so I came to “return the favor". I'm attempting to find things to enhance my website! I suppose its ok to use a few of your ideas!!
    국산야동
    Feel free to visit my blog : 국산야동

    ResponderEliminar
  13. Great work.. Thankx.. The traveling to turkey from Canada who wish to enjoy their holidays in abroad for them Turkey will be the fantastic destination. Apply online for Turkish visa online with 24/7 assistance.

    ResponderEliminar
  14. Hello there! Quick question that’s completely off topic.
    Do you know how to make your site mobile friendly? My website looks weird when viewing from my iphone.
    I’m trying to find a template or plugin that might
    be able to resolve this issue. If you have any recommendations, please share.
    Thank you!

    website:스포츠토토

    ResponderEliminar
  15. This is very good and useful information. I appreciate your work on this website. If you want to know the India business visa cost that is dependent on the visa category and also which is updating the fees structure from time to time. So I suggest you check out the link to update yourself.

    ResponderEliminar
  16. Excellent post.. Thank yo The foreign traveling to Kenya must fill up the Kenya health surveillance form to enter into the country. For ensuing the safety of the citizen and foreign tourist.

    ResponderEliminar
  17. It's good to be back! I thought I've been to this site before, but after reading some of the posts, I realized it's new to me. In any case, I'm glad I found it and will definitely bookmark it and check back often. how to get India visa, You can apply for an Indian visa online through the India visa website.

    ResponderEliminar
  18. Hi there! I thought I've been to this site before, but upon reading some of the content I realized it's brand new to me. how to get India visa, You can apply for an Indian visa online through the India evisa website.

    ResponderEliminar
  19. Enjoy a lot of information to read this post thanks a lot.
    vlone

    ResponderEliminar
  20. I feel extremely cheerful to have seen your post. I found the most beautiful and fascinating one. I am really extremely glad to visit your post.
    Fecal impaction Treatment

    ResponderEliminar
  21. Ariana Grande Merch Official Shop for all Ariana Grande fans. Get Amazing Ariana Grande merchandise Hoodies, sweatshirts and Tshirts. Fast Shipping Worldwide.
    Ariana Grande merchandise

    ResponderEliminar