Extract RTF element with location

Get technical support of Document .Net in C# and VB.Net
codesp7784@gmail.com

Extract RTF element with location

Post by codesp7784@gmail.com »

Hi, I need to extract all the objects from .rtf file, .rtf file may contain text, images, arbitrary shapes like rectangles, lines etc. Along with objects I need the respective information i.e. for text I need font size font color font style font family its starting coordinates. Similarly for Images I need to extract its coordinates, height,width and also the image in any valid formats like .jpg,.png etc. For shapes i need their coordinates, fill color, line width, line color. Can you suggest me any of your product as assembly for .net ?
Oliver
Posts: 29
Joined: Wed Aug 19, 2020 11:59 am
Contact:

Re: Extract RTF element with location

Post by Oliver »

Hello.

Using the Document .Net you may do it very easy.

Unfortunately, I don't have your sample-file, but I've prepared a code sample for you. How to know all information about all elements inside of DOCX (PDF, RTF, HTML) and extract all images from a file.

Code: Select all

using SautinSoft.Document;
using SautinSoft.Document.Drawing;
using System.Linq;
using System;
using System.IO;

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            ShowDocumentInfo();
            
        }


        private static void ShowDocumentInfo()
        {
            string inpFile = @"test.pdf";
            DirectoryInfo imgDir = new DirectoryInfo(Path.GetDirectoryName(inpFile)).CreateSubdirectory("Extracted images");

            DocumentCore dc = DocumentCore.Load(inpFile);
            DocumentPaginator dp = dc.GetPaginator(new PaginatorOptions() { });

            // Total pages
            Console.WriteLine($"Total pages: {dp.Pages.Count}.");

            int pageNum = 0;
            int pictNum = 0;
            foreach (DocumentPage page in dp.Pages)
            {
                pageNum++;
                pictNum = 0;

                // Page info
                Console.WriteLine($"Page: {pageNum}, size: {Math.Round(LengthUnitConverter.Convert(page.PageSetup.PageWidth, LengthUnit.Point, LengthUnit.Millimeter))} Mm x " +
                    $"{Math.Round(LengthUnitConverter.Convert(page.PageSetup.PageHeight, LengthUnit.Point, LengthUnit.Millimeter))} Mm:");

                Console.WriteLine("The elements on the page:");
                foreach (var elFrame in page.GetElementFrames())
                {
                    Console.WriteLine($"{elFrame.Element.ElementType}, bounds: {elFrame.Bounds}");

                    // Show extra info
                    // Textual elements
                    if (elFrame.Element is Run)
                    {
                        CharacterFormat cf = (elFrame.Element as Run).CharacterFormat;
                        Console.WriteLine($"Font Family: {cf.FontName}, Size: {cf.Size}, Color: {ToHexColor(cf.FontColor)}");
                    }
                    // Picture
                    else if (elFrame.Element is Picture)
                    {
                        pictNum++;
                        Picture pict = elFrame.Element as Picture;
                        Console.WriteLine($"Picture format: {pict.ImageData.Format}");

                        // Save picture
                        string pictName = $"Page {pageNum} Pict {pictNum}";

                        switch (pict.ImageData.Format)
                        {
                            case PictureFormat.Jpeg:
                                pictName += ".jpg";
                                break;
                            case PictureFormat.Bmp:
                                pictName += ".bmp";
                                break;

                            default:
                            case PictureFormat.Png:
                                pictName += ".png";
                                break;
                        }
                        File.WriteAllBytes(Path.Combine(imgDir.FullName, pictName), pict.ImageData.ImageBytes);
                    }
                    // Shape
                    else if (elFrame.Element is Shape)
                    {
                        Shape shp = elFrame.Element as Shape;
                        Console.WriteLine($"Line width: {shp.Outline.Width} pt");
                        if (shp.Outline.Fill is SolidColorBrush)
                            Console.WriteLine($"Line color: {ToHexColor((shp.Outline.Fill as SolidColorBrush).Color)}");
                        if (shp.Fill is SolidColorBrush)
                            Console.WriteLine($"Fill color: {ToHexColor((shp.Fill as SolidColorBrush).Color)}");
                    }
                }
            }
            Console.WriteLine("Press any key ...");
            Console.ReadKey();
        }
        private static string ToHexColor(SautinSoft.Document.Color c)
        {
            return String.Format("#{0:X2}{1:X2}{2:X2}", c.R, c.G, c.B);
        }

    }
}
As a result, you will see in the console all elements and their coordinates:

Image

I hope, that this code was helpful for you. If you need more information, please ask me!

Thanks


Last bumped by Anonymous on Sun Jun 20, 2021 3:36 am.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest